Saturday November 21, 2009 3:07 PM AEST

RAID Theory

  • Email a Friend
  • Print Page
 »
RAID Theory
By Ashton Mills
May 19, 2008 | 4 Comments
Tags: RAID

Also RAID setup, RAID benchmarks, RAID considerations, RAID software, RAID hardware...

These days RAID really should be called Redundant Array of Expensive Disks (aka RAED; got net speak?), since it’s people like you and I who frequently take already high-performing disks and improve our storage subsystem performance even more by RAIDing two or more of them.

And of course, this means RAID 0 is the favourite among enthusiasts for its focus on performance, not redundancy.

But as with just about everything else to do with technology, there’s more to RAID 0 than first meets the eye, and in fact your current array could well be under performing if you haven’t build it off the principles of RAID theory, which is precisely what we’ll cover here.


Parallelism and workload
RAID is all about parallelism – for every drive you add, you can conceptually increase the performance of a storage subsystem by splitting the workload among multiple drives at once. This can reap performance benefits as in RAID 0, or redundancy benefits as in RAID 1, or a mixture of the two through RAID levels 3 to 7 (not to mention ‘tiered’ RAIDs, such as 0+1).

But parallelism isn’t just about RAID. Having two drives in a system, with files and directories split across them, can deliver performance benefits (ergo why it’s often recommended to put the swap file on a separate drive from the OS). Separate spindles, as we’ll call it, gives your machine the ability to service multiple I/O requests at once – assuming the files it needs can be found on both drives.

Keeping this in mind, designing a good performance subsystem with RAID is as much about the RAID level and system workload as it is about your storage layout, which we’ll get onto a little later.

This leads us to something we’ll be mentioning a lot – workload. It’s important to remember RAID is not a silver bullet. It won’t exponentially increase the performance of your system across the board; it can only increase the performance of certain workloads, and like a sliding scale, as you increase the performance of one workload you decrease the performance of another.

Since the performance of a hard drive is largely defined by its seek times and transfer rate, what you do with your machine will determine how well a hard drive can deliver – a hard drive with a low seek time will perform better at frequent random accesses, the type that occurs when you load Windows or launch an application. And a drive with a high transfer rate will perform better at tasks like loading large files, or streaming video.

All hard drives obviously do both of these workloads to varying degrees, but one of the side effects of using RAID is that you magnify these differences, and the effectiveness of an array depends on the type of workload it’s going to get. If you build an array designed for throughput, it won’t perform as well for seek based workloads – and if this is what your machine does most of the time, you won’t see a good return for your investment. In fact, as we’ll show, for the wrong workload a RAID array can perform slower than a single drive.


Seek vs throughput
To understand how this can be it helps to understand the many factors that can influence an array. From the operating system, file system, and drivers through to RAID level, RAID hardware, and individual drive performance there are many variables at play. You can’t control all of them, but you can optimise an array for your particular setup.

Since this is Atomic, we’ll be focusing on just RAID 0, but even here there is plenty to explore. We’ll also assume you like to use identical makes and models of drives for your arrays, because this is simply the smart thing to do.

As covered above, RAID isn’t a one size fits all solution – by nature it can provide excellent performance improvements, providing the array is built with your most common workload in mind. The easiest way to categorise your workload is to look at what you use your PC for.

For example, at the extreme ends the type of workload a system may see are frequent random accesses (which we’ll abbreviate as seek), and sustained sequential throughput (commonly called sustained transfer rate). The latter is often exemplified by a machine that may be used in video editing or streaming – large, contiguous, files. The former is frequently represented by common operating system use – lots of small files being accessed at different times. If you’re wondering, games usually fall between the two, something you’ll be able to see in the benchmark results to follow.

Many enthusiasts optimise their arrays for raw throughput, putting the benchmarks of programs like ATTO and SiSoft Sandra on a pedestal – but this is a mistake. Inevitably, an array that performs well at throughput doesn’t perform as well at frequent random accesses – the workloads that require frequent seeking – which just happens to be the workload of their machines when they’re actually using them and not running benchmarks.

To demonstrate this we’ll be benchmarking using two high-end drive models – the 10,000 RPM Raptor, and the new 32M cache Seagate 7200.11. The Raptor isn’t as fast as the Seagate for throughput, but its 10,000 RPM spindle speed gives it a faster seek time. So which will be better suited to your workloads? And will both perform in RAID-0?

 
 »
 
This article appeared in the May, 2008 issue of Atomic.

The latest issue is on sale now!

Want to learn all about Diablo III? Want to find out what the best Solid State Drive is on the market today, and how to look after it? Want to catch up on the latest hardware, games and in depth tech from Australia's best enthusiast mag?

Get your copy today :)
4 Comments
Thoughts on this article? Add a comment below.
Fat_Bodybuilder
Sep 17, 2008 10:31 AM
Is parallelism a real word? :P

Nice work ;-)
osama_bin_athlon
Sep 17, 2008 8:15 PM
er, HDD's are cheaper than ever......under $80 for a 500G (Maxtor 500G $78 @ MSY, for instance), how's that expensive?
Goonit
Sep 20, 2008 9:28 AM
Wow, I've been under the impression it would make a huge difference to load times, doesn't seem the case at all.

One newer generation hard drive, is better then 2 older in raed. :)

Atomic always answers the questions we ponder,
Thanks for the article.

Fat_Bodybuilder
Sep 21, 2008 7:44 PM
Remember that this is covering RAID0, which is not really RAID at all.

And this is a very old article, HDDs were still a little expensive, then.
Login or register to submit a comment.
 
 
 
Atomic Magazine

Issue: 107 | December, 2009

Atomic is a magazine aimed squarely at computer enthusiasts, gamers, and serious PC upgraders.

Every month we bring you the latest reviews of new technology and PC components, in depth features on everything from overclocking to console hacking, and gaming previews and interviews.
 
Latest Comments
"Fucking signed.

"
by index680i | Nov 21, 2009 2:54 PM
 
""sudo preupgrade"
...failed to download installer metadata
------------
So ..."
by wlayton27 | Nov 21, 2009 8:16 AM
 
"I thought Vista outlived it's usefulness about the same time it was released , lol"
by mr.gargoyle | Nov 21, 2009 12:28 AM
 
"^ I find with CoD4 that I can jump on an empty server and be joined by 6-12 others before the ..."
by Ezekill | Nov 20, 2009 10:10 PM
 
"check

LOMAC
DCS Black Shark
X-plane"
by Bastard Child | Nov 20, 2009 8:13 PM
Latest User Reviews
Shenmue II
10%
asdfasdf
 
EVGA X58 Classified
90%
great board, a few things could be better
 
EVGA X58 Classified
90%
Gorgeous looking
 
Sapphire 4890
90%
So good, I immediately wanted a second one!
 
MSI 790FX-GD70 motherboard
90%
Allmost the prefect gaming board