Friday February 10, 2012 3:39 AM AEST

RAID Theory

By Ashton Mills
10:52 May 19, 2008 | 4 Comments
Tags: RAID
 »
RAID Theory

Also RAID setup, RAID benchmarks, RAID considerations, RAID software, RAID hardware...

These days RAID really should be called Redundant Array of Expensive Disks (aka RAED; got net speak?), since it’s people like you and I who frequently take already high-performing disks and improve our storage subsystem performance even more by RAIDing two or more of them.

And of course, this means RAID 0 is the favourite among enthusiasts for its focus on performance, not redundancy.

But as with just about everything else to do with technology, there’s more to RAID 0 than first meets the eye, and in fact your current array could well be under performing if you haven’t build it off the principles of RAID theory, which is precisely what we’ll cover here.


Parallelism and workload
RAID is all about parallelism – for every drive you add, you can conceptually increase the performance of a storage subsystem by splitting the workload among multiple drives at once. This can reap performance benefits as in RAID 0, or redundancy benefits as in RAID 1, or a mixture of the two through RAID levels 3 to 7 (not to mention ‘tiered’ RAIDs, such as 0+1).

But parallelism isn’t just about RAID. Having two drives in a system, with files and directories split across them, can deliver performance benefits (ergo why it’s often recommended to put the swap file on a separate drive from the OS). Separate spindles, as we’ll call it, gives your machine the ability to service multiple I/O requests at once – assuming the files it needs can be found on both drives.

Keeping this in mind, designing a good performance subsystem with RAID is as much about the RAID level and system workload as it is about your storage layout, which we’ll get onto a little later.

This leads us to something we’ll be mentioning a lot – workload. It’s important to remember RAID is not a silver bullet. It won’t exponentially increase the performance of your system across the board; it can only increase the performance of certain workloads, and like a sliding scale, as you increase the performance of one workload you decrease the performance of another.

Since the performance of a hard drive is largely defined by its seek times and transfer rate, what you do with your machine will determine how well a hard drive can deliver – a hard drive with a low seek time will perform better at frequent random accesses, the type that occurs when you load Windows or launch an application. And a drive with a high transfer rate will perform better at tasks like loading large files, or streaming video.

All hard drives obviously do both of these workloads to varying degrees, but one of the side effects of using RAID is that you magnify these differences, and the effectiveness of an array depends on the type of workload it’s going to get. If you build an array designed for throughput, it won’t perform as well for seek based workloads – and if this is what your machine does most of the time, you won’t see a good return for your investment. In fact, as we’ll show, for the wrong workload a RAID array can perform slower than a single drive.


Seek vs throughput
To understand how this can be it helps to understand the many factors that can influence an array. From the operating system, file system, and drivers through to RAID level, RAID hardware, and individual drive performance there are many variables at play. You can’t control all of them, but you can optimise an array for your particular setup.

Since this is Atomic, we’ll be focusing on just RAID 0, but even here there is plenty to explore. We’ll also assume you like to use identical makes and models of drives for your arrays, because this is simply the smart thing to do.

As covered above, RAID isn’t a one size fits all solution – by nature it can provide excellent performance improvements, providing the array is built with your most common workload in mind. The easiest way to categorise your workload is to look at what you use your PC for.

For example, at the extreme ends the type of workload a system may see are frequent random accesses (which we’ll abbreviate as seek), and sustained sequential throughput (commonly called sustained transfer rate). The latter is often exemplified by a machine that may be used in video editing or streaming – large, contiguous, files. The former is frequently represented by common operating system use – lots of small files being accessed at different times. If you’re wondering, games usually fall between the two, something you’ll be able to see in the benchmark results to follow.

Many enthusiasts optimise their arrays for raw throughput, putting the benchmarks of programs like ATTO and SiSoft Sandra on a pedestal – but this is a mistake. Inevitably, an array that performs well at throughput doesn’t perform as well at frequent random accesses – the workloads that require frequent seeking – which just happens to be the workload of their machines when they’re actually using them and not running benchmarks.

To demonstrate this we’ll be benchmarking using two high-end drive models – the 10,000 RPM Raptor, and the new 32M cache Seagate 7200.11. The Raptor isn’t as fast as the Seagate for throughput, but its 10,000 RPM spindle speed gives it a faster seek time. So which will be better suited to your workloads? And will both perform in RAID-0?

 
 »
 
This article appeared in the May, 2008 issue of Atomic.

Behind the scenes with Mass Effect 3! GTX 560 VGA round-up! Essential Skyrim tweaks to improve your game! Plus reviews, news, hardware, more games, and easy to following modding guides for PC builders. ON SALE NOW!
4 Comments
Fat_Bodybuilder
Sep 17, 2008 10:31 AM
Is parallelism a real word? :P

Nice work ;-)
osama_bin_athlon
Sep 17, 2008 8:15 PM
er, HDD's are cheaper than ever......under $80 for a 500G (Maxtor 500G $78 @ MSY, for instance), how's that expensive?
Goonit
Sep 20, 2008 9:28 AM
Wow, I've been under the impression it would make a huge difference to load times, doesn't seem the case at all.

One newer generation hard drive, is better then 2 older in raed. :)

Atomic always answers the questions we ponder,
Thanks for the article.

Fat_Bodybuilder
Sep 21, 2008 7:44 PM
Remember that this is covering RAID0, which is not really RAID at all.

And this is a very old article, HDDs were still a little expensive, then.
Comments have been disabled on this article.
 
Latest Competitions
 
Atomic Magazine

Issue: 133 | February, 2012

Atomic is a magazine aimed squarely at computer enthusiasts, gamers, and serious PC upgraders.

Every month we bring you the latest reviews of new technology and PC components, in depth features on everything from overclocking to console hacking, and gaming previews and interviews.
 
Latest Comments
 
Latest User Reviews
Battlefield 3 is the new benchmark online FPS
90%
A very fun and realistic multiplayer ride.
 
Antec Kuhler 920 - liquid cool
90%
Antec Kuhler 920 silent but effientive out of the box no maintence water cooling kit
 
Antec's Lanboy Air - our new favourite case
90%
Antec Lan boy Air in red a very cool design
 
Antec's Lanboy Air - our new favourite case
90%
This product overall is awesome.
 
MSI's GT780 laptop as fast as it gets
90%
Nice laptop
 
 
Close Get the February, 2012 issue of Atomic mailed to you for $8.95, including postage.

SubscribeBuy nowDigital Version