My 2.8TB RAID 5 array is finally up and running. Here I'll discuss my initial intended specifications, what I actually ended up with, and associated commentary. Please see <URL:http://groups.google.ca/groups?selm=> and <URL:http://groups.google.ca/groups?selm=> for background material. STORAGE MEDIUM Initial: Eight 250GB SATA drives. Actual: Nine 400GB PATA drives; eight for use, one as a cold spare. Why: Found a stupendous sale at CompUSA Christmas week; just-released-in-November Seagate Barracuda 7200.8 400GB PATA drives at $230 each, with no quantity limitation . I'd have loved to have gone with the SATA model, but given that Froogle lists the lowest price for one at $350 (the PATA model retails at $250-350), it was an easy choice. CASE Initial: Antec tower case. Actual: Antec 4U rackmount case. Why: I'd always thought of rackmounts as unsuitable for anyone with an actual rack sitting in their data center, but after realizing that a rackmount case is simply a tower case sitting on its size, it was an easy decision given the space advantages. The Antec case here comes with Antec's True Power 550W EPS12V power supply, and both have great reputations. In practice, I found that the Antec case was remarkably easy to open up (one thumbscrew), work with (all drive cages are removable), and roomy. MOTHERBOARD Initial: Unspecified, but probably something Athlon-based and cheap. Actual: Gigabyte X5DAL-G Intel server motherboard Why: I became convinced that the sheer volume of the PCI traffic generated by my proposed array under software RAID would overwhelm any non-server motherboard, resulting in errors. In addition, I wanted PCI-X slots for optimal performance. Even though I think AMD in general offers much better bang for the buck, since I didn't want to spend the $$$ for Opteron, a Xeon motherboard with an Intel server chipset was the best comprimise. CONTROLLER CARDS Initial: Two Highpoint RocketRAID 454 cards. Actual: Two 3Ware 7506-4LP cards. Why: I needed PATA cards to go with my PATA drives, and also wanted to put the two PCI-X slots on my motherboard to use. I found exactly two PATA PCI-X controller cards: The 3Ware, and the Acard AEC-6897. Given that the Acard's Linux driver compatibility looked really, really iffy, I went with the 3Ware. I briefly considered the 7506-8 model, which would've saved me about $120, but figured I'd be better off distributing the bandwidth over two PCI-X slots rather than one. SOFTWARE Initial: Linux software RAID 5 and XFS or JFS. Actual: Linux software RAID 5 and JFS. Why: Initially I planned on software RAID knowing that the Highpoint (and the equivalent Promise and Adaptec cards) didn't do true hardware RAID. Even after switching over to 3Ware (which *does* do true hardware RAID), everything I saw and read convinced me that software RAID was still the way to go for performance, long-term compatibility, and even 400GB extra space (given I'd be building one large RAID 5 array instead of two smaller ones). I saw *lots* of conflicting benchmarks on whether XFS or JFS was the way to go. Ultimately <URL:http://pcbunn.cacr.caltech.edu/gae/3ware_raid_tests.htm> pushed me toward JFS, but I suspect I could have gone XFS with no difficulty whatsoever. COST As implied above, I paid $2070 plus sales tax for the drives. I lucked out and found a terrific eBay deal for a prebuilt system containing the above-mentioned case and motherboard, two Xeon 2.8GHz CPUs, a DVD drive, and 2GB memory for $1260 including shipping labor aside, I'd have paid *much* more to build an equivalent system myself. The 3Ware cards were $240 each, no shipping or tax, from Monarch Computer. With miscellaneous costs (such as a Cooler Master 4-in-3 drive cage and an 80GB boot drive from Best Buy for $40 after rebates), I paid under $4100, tax and shipping included, for everything. At $1.46/GB *plus* a powerful dual-CPU system, boatloads of memory, and a spare drive, I am quite satisfied with the overall bang for the buck. ASSEMBLY: HARDWARE I spent most of the assembly time on the physical assembly part; it's astonishing just how long the simple tasks of opening up each retail-boxed drive, screwing the drive into the drive cage, putting the cage into the case, removing the cage and the drive when you realize you've put the drive in with the wrong mounting holes, reinstalling the drive and cage, etc., etc. take! My studio apartment still looks like a computer store exploded inside it. 3Ware wisely provides PATA master-only cables with its cards, which saved some room, but my formerly-roomy case nonetheless looks like the rat's nest to end all rat's nests inside. ASSEMBLY: SOFTWARE I'd gone ahead and installed Fedora Core 3 with the boot drive only before the controller cards arrived. The 3Ware cards present each PATA drive as a SCSI device (/dev/sd[a-h]). Once booted, I used mdadm to create the RAID array (no partitions; just whole drives). While the array chugged along to create the parity information (about four hours), I then created one large LVM2 volume group and logical volume on top of the array, then created one large JFS file system. By the way, I found a RAID-related bug with Fedora Core's bootscripts; see <URL:https://bugzilla.redhat.com/beta/show_bug.cgi?id=129633>). RESULTS 'df -h': /dev/mapper/VolGroup01-LogVol00 2.6T 221G 2.4T 9% /mnt/newspace 'mdadm --detail /dev/md0': Version : 00.90.01 Creation Time : Wed Feb 16 01:53:33 2005 Raid Level : raid5 Array Size : 2734979072 (2608.28 GiB 2800.62 GB) Device Size : 390711296 (372.61 GiB 400.09 GB) Raid Devices : 8 Total Devices : 8 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sat Feb 19 16:26:34 2005 State : clean Active Devices : 8 Working Devices : 8 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Number Major Minor RaidDevice State 0 8 0 0 active sync /dev/sda 1 8 16 1 active sync /dev/sdb 2 8 32 2 active sync /dev/sdc 3 8 48 3 active sync /dev/sdd 4 8 64 4 active sync /dev/sde 5 8 80 5 active sync /dev/sdf 6 8 96 6 active sync /dev/sdg 7 8 112 7 active sync /dev/sdh Events : 0.319006 'bonnie++ -s 4G -m 3ware-swraid5-type -p 3 ; \ bonnie++ -s 4G -m 3ware-swraid5-type-c1 -y & \ bonnie++ -s 4G -m 3ware-swraid5-type-c2 -y & \ bonnie++ -s 4G -m 3ware-swraid5-type-c3 -y &' (To be honest these results are just a bunch of numbers to me, so any interpretations of them are welcome. I should mention that these were done with three distributed computing [BOINC, mprime, and [email protected]] projects running in the background. Although 'nice -n 19' each, they surely impacted CPU and perhaps disk performance somewhat.) Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP 3ware-swraid5-ty 4G 15749 50 15897 8 7791 6 10431 49 20245 11 138.1 2 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 381 6 +++++ +++ 208 3 165 7 +++++ +++ 192 4 3ware-swraid5-type-c1,4G,15749,50,15897,8,7791,6,10431,49,20245,11,138.1,2,16,381,6,+++++,+++,208,3,165,7,+++++,+++,192,4 done. Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP 3ware-swraid5-ty 4G 13739 46 17265 9 7930 6 10569 50 20196 11 146.7 2 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 383 7 +++++ +++ 207 3 162 7 +++++ +++ 191 4 3ware-swraid5-type-c2,4G,13739,46,17265,9,7930,6,10569,50,20196,11,146.7,2,16,383,7,+++++,+++,207,3,162,7,+++++,+++,191,4 done. Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP 3ware-swraid5-ty 4G 13288 43 16143 8 7863 6 10695 50 20231 12 149.6 2 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 537 9 +++++ +++ 207 3 161 7 +++++ +++ 188 4 3ware-swraid5-type-c3,4G,13288,43,16143,8,7863,6,10695,50,20231,12,149.6,2,16,537,9,+++++,+++,207,3,161,7,+++++,+++,188,4 FINAL NOTES, THOUGHTS, AND QUESTIONS I've noticed that over sync NFS, initiating a file copy from my older Athlon 1.4GHz system to the RAID array system is *much, much, much* (seconds as opposed to many minutes)slower than if I initiate the copy in the same direction but from the array system. Why is this? I almost went with the SATA (8506) version of the 3Ware cards and a bunch of PATA-SATA adapters in order to maintain compatibility with future drives, likely to be SATA only. However, a colleague pointed out the foolishness of paying $200 extra ($120 for eight adapters plus $80 for the extra cost of the SATA cards) in order to (possibly) futureproof a $480 investment. I was concerned that the drives (and the PATA cables) would cause horrible heat and noise issues. These, surprisingly, didn't occur; according to 'sensors', internal temperatures only rose by a few degrees, and the server is just as (very) noisy now as pre-RAID drives. I think I'l be able to get away with stuffing the array inside my hall closet after all. The server, before I put the cards and RAID drives into the system but with the distributed-computing projects putting the CPU at 100% utilization, took the power output on my Best Fortress 750VA/450W UPS from about 55% to about 76%. With the RAID up and running and again with 100% CPU utilization, output is 87-101% with the median at perhaps 93%. I realize I really ought to invest in another UPS, but with these figures I'm tempted to get by on what I currently have. Yes, I could've saved a considerable amount of money had I gone with, say, a used dual PIII server system with regular PCI slots (and, thus, $80 Highpoint RAID cards, again for the four PATA channels and not for their RAID functionality per se) and 512MB. And I suspect that for a home user like me performance wouldn't have been too much less. But I like to buy and build systems I can use for years and years without having to bother with upgrading, and figure I've made a long-term (at least 4-5 years, which is long term in the computer world) investment that provides me with much more than just storage functionality. And again, $1.46/GB is hard to beat.
Flat. The only thing special about them was that they lacked slave connectors. I'm glad they're flat; despite the (lack of) air flow, at some point I intend to try the fabled PATA cable origami methods I've heard about. This does concern me. How the heck do I tell them apart, even now? How di I figure out which drive is sda, which is sdb, which is sdc, etc., etc.? Advice is appreciated. Not me; all my research told me that software was the way to go for both performance and downward-compatibility reasons. Thank you. It's still amazes me to see that little '2.6T' label appear in the 'df -h' output.
One way is to disconnect them one by one, and see which drive is missing in the list (unless you want to test the md driver's reconstruction abilities, you should be doing this with a kernel that does not have an md driver, probably booting from CD). You can also use that method when a drive fails (but then its even more important that the kernel does not have an md driver). Another way is to just look which ports on the cards connect with which drives. They are typically marked on the card and/or in the manual with IDE0, IDE1, etc. You also have to find out which card is which. There may be a method to do this through the PCI IDs, but I would go for the disconnection method for that. Followups set to comp.os.linux.hardware (because I read that, csiphs would probably be more appropriate). - anton
PSU concerns are why I went with an Antec 550W supply as opposed to some 300-400W noname brand. Since my rackmount case does not have room for a redundant supply, I suspect this is the best I can do. As you say, PSU problems are relatively rare. That said, anyone know how I can dynamically measure the actual wattage used by my system, beyond just adding up each individual component's wattage?
Another option is the Watts-Up meter, which I've been using for a few years and it's been very solid and reliable. But I don't know if it's any better than the Kill-A-Watt however, at 25% the price. There's a new Watts-Up Pro that has a nifty-looking PC (Windows) interface: http://www.nooutage.com/wattsup-pro.htm ... So geekorific, I might have to get one.
Not necessarily. PCI (and PCI-X) bandwidth is per bus, not per slot. So if those two cards are in two slots on one PCI-X bus, that's not distributing the bandwidth at all. The motherboard may offer multiple PCI-X busses, in which case the OP may want to ensure the cards are in slots that correspond to different busses. The built-in NIC on most motherboards (along with most other built-in devices) are also on one (or more) of the PCI busses, so consider bandwidth used by those as well when distributing the load.
Probably, yes. Depends on what PCI-X (version, clock) and whether the slots are seperate PCI buses or not. If seperate buses the highest clock is atainable and they both have the full PCI-X bandwidth, say 1GB/s (133MHz) or 533 MB/s (66MHz) If on same bus, the clock is lower to start with and they have to share that bus PCI-X bandwidth, say a still plenty 400MB/s each (100MHz) but may become iffy in case of 66MHz clock (266MB/s) or even 50MHz. What if?
The Supermicro X5DAL-G motherboard does indeed offer a dedicated bus to each PCI-X slot, thus my desire to spread out the load with two cards. Otherwise I'd have gone with the 7506-8 eight-channel card instead and saved about $120. The built-in Gigabit Ethernet jack does indeed share one of the PCI-X slots' buses, but I only have a 100Mbit router right now. I wonder whether I should expect it to significantly contribute to overall bandwidth usage on that bus, either now or if/when I upgrade to Gigabit?
No, the consensus is that Linux software RAID 5 has the edge on even 3Ware (the consensus hardware RAID leader). See, among others, <URL:http://www.chemistry.wustl.edu/~gelb/castle_raid.html> (which does note that software striping two 3Ware hardware RAID 5 solutions "might be competitive" with software) and <URL:http://staff.chess.cornell.edu/~schuller/raid.html> (which states that no, all-software still has the edge in such a scenario).
If all you care about is "rod length check" long-sequential-read or long-sequential-write performance, that's probably true. If, of course, you restrict yourself to a single stream... ....of course, in the real world, people actually do short writes and multi-stream large access every once in a while. Software RAID is particularly bad at the former because it can't safely gather writes without NVRAM. Of course, both software implementations *and* typical cheap PCI RAID card (e.g. 3ware 7/8xxx) implementations are pretty awful at the latter, too, and for no good reason that I could ever see.
No, one PCI-X card would be just as good. The numbers that you posted from Bonnie++ , if I followed them correctly, showed max throughputs in the 20 MB/second range. That seems awfully slow for this sort of setup. As a comparison, I have two machines with software RAID 5 arrays, one a 2x866 P3 system with 5x120-gig drives, the other an A64 system with 8x300 gig drives, and both of them can read and write to/from their RAID 5 array at 45+ MB/s, even with the controller cards plugged into a single 32/33 PCI bus. To answer your question, GigE at full speed is a bit more than 100 MB/sec. The PCI-X busses on that motherboard are both capable of at least 100 MHz operation, which at 64 bits would give you a max *realistic* throughput of about 500 MB/second, so any performance detriment from using the gigE would likely be completely insignificant. I've got another machine with a 3Ware 7000-series card with a bunch of 120-gig drives on it (I haven't looked at the machine in quite a while), and I was pretty disappointed with the performance from that controller. It works for the intended usage (point-in-time snapshots), but responsiveness of the machine under disk I/O is pathetic - even with dual Xeons. steve
Agreed. However, those benchmarks were done with no tuning whatsoever (and, as noted, the three distributed computing projects going full blast); since then I've done some minor tweaking, notably the noatime mount option, which has helped. I'd post newer benchmarks but the array's right now rebuilding itself due to a kernel panic I caused by trying to use smartctl to talk to the bare drives without invoking the special 3ware switch. That was my sense as well; I suspect network saturation-by-disk will only cease to be an issue when we all hit the 10GigE world. (Actually, the 7506 cards are 66MHz PCI-X, so they don't take full advantage of the theoretical bandwidth available on the slots, anyway.) Appreciate the report. Fortunately, as a home user performance (or given that I'm only recording TV episodes, even data integrity actually; thus no backup plans for the array, even if backing up 2.8TB was practical in any way budgetwise) isn't my prime consideration. Were I after that, I'd probably have gone with the 9000-series controllers and SATA drives, but my wallet's busted enough with what I already have!
I noticed that, too, but then noticed that the OP seemed to be running three copies of Bonnie++ in parallel. His command line was: 'bonnie++ -s 4G -m 3ware-swraid5-type -p 3 ; \ bonnie++ -s 4G -m 3ware-swraid5-type-c1 -y & \ bonnie++ -s 4G -m 3ware-swraid5-type-c2 -y & \ bonnie++ -s 4G -m 3ware-swraid5-type-c3 -y &' I'm no expert, but if he's running three in parallel on the same software RAID, I'd suspect that the total performance should be taken as the *sum* of those three---or over 60 MB/sec. As another point of comparison: 5x73GB SCSI drives, software RAID-5, one U160 SCSI channel, 32-bit/33-MHz bus, dual 1GHz P-III: writes at 36 MB/sec and read reads at 74 MB/sec.
(Actually, the 7506 cards are 66MHz PCI-X, so they don't take full There is no 66MHz PCI-X. 3Ware 7506 cards are PCI 2.2 compliant 64-bit/66MHz bus master.
The PCI-SIG seem to think different. Perhaps you know better then? And contrary to what you say elsewhere, they say there is no 100MHz spec. That was added by the industry.
I wrote earlier: As it turns out, it proved straightforward to use either 'smartctl -a --device=3ware,[0-3] /dev/twe[0-1]' or 3Ware's 3dm2 and tw_cli (available on the Web site) tools to read the serial numbers of the drives. So mystery solved.
Thats measuring the power INTO the power supply, not what its supplying so isnt very useful for checking how close you are getting to the PSU rating.
I noticed that, too, but then noticed that the OP seemed to be running Good point- I missed that! steve