www.linux.org
Linux IDE-RAID Notes - IDE-3WRAID

Mar 2001

# df -H /mnt/tmp/md*
Filesystem            Size  Used Avail Use% Mounted on
/dev/md0              530G  0.0G  530G   0% /mnt/tmp/md0
/dev/md1              519G  0.0G  519G   0% /mnt/tmp/md1
# df -h /mnt/tmp/md*
Filesystem            Size  Used Avail Use% Mounted on
/dev/md0              493G  0.0G  493G   0% /mnt/tmp/md0
/dev/md1              483G  0.0G  483G   0% /mnt/tmp/md1

Overview

Given the following:
  1. the need to put more storage online quickly
  2. stability and performance problems with the I75RAID/Promise system still under test
  3. the announcement of RAID5 availability 2/15/2001 for the 3ware Escalade 6800 at Linux World
I bought 3ware Escalde 6800 controllers to put together the following server. Yes, hardware RAID isn't Linux software RAID, but it provides an interesting alternative and comparison point, and the hardware can be used in JBOD mode with Linux software RAID on top.  The drives are all IDE drives on master channels, however, the controller appears as a SCSI controller so arrays and drives are configured starting with /dev/sda.  When the second controller is added, ASUS or 3ware orders the controllers from high-to-low PCI slot.   Yes, the "high" cost for these controllers is contrary to maximizing server-GBs/dollar, but the SCSI interface also bypasses the slowness problems that I'm still fighting with the I75RAID/Promise system, probabaly due to the IDE elevator code.

Suggested production configurations included:
 
RAID configuration Filesystem Total Capacity Comment
2 software-RAID5 arrays of 8-drives
ext2
1TB
less storage, more redundancy & throughput
1 software-RAID5 array of 16-drives
reiserfs
1.1TB
more storage, less redundancy & throughput

There is a limit of 1TB for ext2, so reiserfs must be used for filesystems larger than 1TB.  The performance measurements below show that reiserfs performance with this configuration drops unacceptably with very large arrays.  Unfortunately, I had to put this system into service before I could look find a solution to this problem, perhaps the journal could be written to a separate spindle?  So the production configuration is 2 half-TB ext2 arrays.  The sizes of the file systems are slightly different because the IBM 75GXP drives seem to come in two sizes, 76.8GB and 75.3GB.

Hardware

Asus A7M266 - with 1GHz AMD Athlon 266MHz FSB and 128MB 266MHz RAM
    AMD761 Socket A motherboard - ATX, 266MHz FSB, PC2100 DDR DRAM, UDMA/100, 5xPCI
replaced AsusCUSL2 with 800MHz Intel Pentium III 133MHz FSB (EB) and 128MB 133MHz RAM
    815E Solano 2 motherboard - ATX, 133MHz FSB, PC133, UDMA/100, 6xPCI
2 3ware Escalade 6800 - 8-channel / 8-drive Ultra ATA/66 RAID 0/1/10/5 controller, $357 ea.
16 IBM Deskstar 75GXP - 75GB EIDE Ultra-ATA/100 7200RPM 8.5ms (37MB/s)
1 IBM Deskstar 75GXP - 15GB EIDE Ultra-ATA/100 7200RPM 8.5ms (37MB/s) [boot drive]
GlobalWIN FOP32 - dual bearing high speed fan, $19
Chenbro A9691 (KRI CK9691) - 21-bay Ultra Server case, $370
Seventeam ST400GL - 400W power supply, $85 [not recommended - only 3 drive-power cables]
CTG (Cables To Go) 18729 - 24in int Ultra DMA/ATA ribbon 3 connector IDE 33/66 UDMA $11

Cabling, Cases, and Power

In assembly of the hardware, cabling concerns are paramount.  The 21-bay Ultra Server case has more than enough bays, but the all of the bays can't be reached with 24" cables.  One 6800 controller is in PCI-slot 1 with 24" cables to the 8-drive 3.5" cage in the back of the case.  I used the case of an old CDROM drive as a mounting bracket to put 4 drives into 3 bays -- why can't I find something "inexpensive" like this on the market?  The other 6800 controller is in PCI-slot 3 with supplied 18" cables to 4 drives and 24" cables to 4 drives.  While 3ware supplies a full set of 18" cables, I was only able to use 4 of them, so there is additional cost for 12 cables of 24". Yes, the ATA spec sets a limit of 18", but I have yet to find a case that allows you to put a large number of drives in one enclosure and keep to the 18" limit.  Even 24" cables aren't enough in many connections.  The 24" cables seem to work fine for me.  For experience with the longer and more troublesome 36" cables, see the i75raid page. Are there any file-server case designers out there listening?  Please design for minimal cable length, and please call me!

Important - Make sure to do a basic sanity check each drive using "hdparm -Tt" and check for expected single drive performance.  Repair as neccessary by reseating cables, swapping cables, drives and controllers.  I have seen some drives with significantly slower performance!  This could save you a lot of time, as poorly performing hardware invalidates the whole test suite.  It's a good idea to add "hdparm -Tt" to your test scripts as a sanity check on each drive before JBOD or RAID tests.

Related note - Bart Locanthi also reported slow motherboard controller configuration (UDMA/33 versus UDMA/100) due to BIOS settings, so check to make sure that both your primary/secondary motherboard controllers and related master/slave channels are enabled in your BIOS, e.g., with the setting "auto", not "none".

I much prefer the Antec 18-bay case, but it is discontinued.  Any server case with a drive compartment on the back side of the motherboard will probably have serious cabling problems.  Minimizing drive-cable lengths while maintaining adequate access to expansion cards and drive bays may be a challenge, but I'm still hoping to find a satisfactory case that is in production -- help!  The makeshift CDROM-case bracket may help to reduce drive bay requirement.  I'm somewhat astonished at the lack of attention to high-density low-cost (IDE-drive) storage solutions.  One new product is the Amax Terabyte RAID server solution.

The Seventeam power supply has only 3 power cables.  Instead, I recommend the Enlight EN-8407362 ATX 400W single power supply for server cases which has 6 power cables.

Suggestion for 3ware

Please consider a different layout for future cards.  To assist with cabling, the IDE connectors should be similar to the Promise connectors -- mounted on the top edge of the card facing up so that the cables lay flat with the board.  Two rows of 4 connectors would be required, but only cable-thickness clearance is needed for cables from the lower row to clear the top row.  With this layout, controller cards would easily fit in adjacent slots.

Also, please make sure that there is no small BIOS limit to the number of controllers in the system, e.g., not the limite of 3 Promise controllers per system.  Where inexpensive high-density storage is the goal, we can picture a system with all slots filled to capacity, mostly with 3ware controllers!  With the above layout fix for cabling, this would enable something like the Amax Terabyte RAID server with 32 drives but with 4 controllers on one motherboard.  Even more could be fit onto a server motherboard.  If only ... connector fix ... good server case ... power supply ... motherboard, e.g., Asus CUR-DLS with built-in video and ethernet plus 7 PCI slots $599 (ouch) ... yeah, dream on ...

Firmware

3ware upgrade65.zip -  firmware upgrade with RAID5 - follow Service & Support / Software Library - 6000 series Drivers & Firmware - 600 series Firmware - download/unzip/readme.1st ...

Software

RedHat 7.1 (Wolverine) Linux - Important edits

Replace unapproved gcc with kgcc
# mv /usr/bin/gcc /usr/bin/gcc-
# ln -s /usr/bin/kgcc /usr/bin/gcc
or otherwise edit kernel Makefile and carefully specify kgcc where critical
# edit Makefile
CC      :=$(shell if which $(CROSS_COMPILE)kgcc > /dev/null 2>&1; then echo $(CROSS_COMPILE)kgcc; else echo $(CROSS_COMPILE)gcc; fi) -D__KERNEL__ -I$(HPATH)

# ed /etc/sysconfig/harddisks
USE_DMA=1
MULTIPLE_IO=16
EIDE_32BIT=1
LOOKAHEAD=1
EXTRA_PARAMS=
# ed /etc/rc.d/rc.sysinit
disk[0]=s; disk[1]=hda; disk[2]=hdb; disk[3]=hdc; disk[4]=hdd;
disk[5]=hde; disk[6]=hdf; disk[7]=hdg; disk[8]=hdh;
disk[9]=hdi; disk[10]=hdj; disk[11]=hdk; disk[12]=hdl;
disk[13]=hdm; disk[14]=hdn; disk[15]=hdo; disk[16]=hdp;
disk[17]=hdq; disk[18]=hdr; disk[19]=hds; disk[20]=hdt;

if [ -x /sbin/hdparm ]; then
   for device in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20; do

Maximum RAID size - drives per set (MD_SB_DISKS)

The RedHat default maximum is 12, the hard maximum is 27, limited by the 4096-byte superblock.  The following update installs tools that support the hard maximum.

RAID tools - update required - http://people.redhat.com/mingo/raid-patches/

# tar xf raidtools-dangerous-0.90-20000116.tar
# cd raidtools-0.90
# ./configure; make all install

Maximum File-System sizes

With the default block size of 1K, ext2 is limited to 1TB.  For reiserfs, the following update is required for file systems larger than 0.5TB.

Reiserfs tools - update required - http://www.namesys.com/
# tar xf reiserfsprogs-3.x.0f.tar
# cd reiserfsprogs-3.x.0f
# ./configure; make all install

Linux 2.4.2

Linux 2.4.2 kernel (includes Reiserfs) - Kernel-HOWTO

# VER=2.4.2
# umask 002
# mkdir /usr/src/linux-$VER; cd /usr/src/linux-$VER; tar xf linux-$VER.tar; mv linux/* .; rmdir linux
# cd ..; rm /usr/src/linux; ln -s /usr/src/linux-$VER /usr/src/linux
# cd linux-$VER
# make mrproper
# make xconfig    # (remember to enable any other drivers for SCSI support, Network device support, Sound, etc)
    Code maturity level options
    y    Prompt for development and/or incomplete code/drivers
    Multi-device support (RAID and LVM)
    y    Multiple devices driver support (RAID and LVM)
    y        RAID support
    y            Linear (append) mode
    y            RAID-0 (striping) mode
    y            RAID-1 (mirroring) mode
    y            RAID-4/RAID-5 mode
    ATA/IDE/MFM/RLR support
        IDE, ATA and ATAPI Block devices
    y    Generic PCI bus-master DMA support
    y    Use PCI DMA by default when available
    y        Intel PIIXn chipsets support
    y            PIIXn Tuning support
    y        PROMISE PCD PDC20246/PDC20262 support
    y            Special UDMA Feature
    SCSI support
        SCSI low-level drivers
    y    3ware Hardware ATA-RAID support
    Network Device Support
        Ethernet (10 or 100Mbit)
            (don't forget the network card)
    File Systems
    y    Reiserfs support
        Network File Systems
    y    NFS file system support
    y        Provide NFSv3 client support
    y    NFS server support
    y        Provide NFSv3 server support
    y    SMB file system support (to mount Windows shares etc.)
# make dep clean bzImage modules modules_install
# sh scripts/MAKEDEV.ide
# cp arch/i386/boot/bzImage /boot/vmlinuz-$VER
# cp System.map /boot/System.map-$VER
# ed /etc/lilo.conf
image=/boot/vmlinuz-2.4.2
    label=linux
    read-only
    root=/dev/hda5
# lilo    # LILO mini-HOWTO, BootPrompt-HowTo
# reboot

Disk Configuration

Standard ext2fs setup/test example

    # cfdisk /dev/hde
    # mke2fs /dev/hde1
    # mount -t ext2 /dev/sda1 /mnt/tmp
    # cd /mnt/tmp
    # bonnie++ -u root -s 1000 -m sda1
    # cd /
    # umount /mnt/tmp

Reiserfs setup/test example

    # cfdisk /dev/hde
    # mkreiserfs /dev/hde1
    # mount -t reiserfs /dev/sda1 /mnt/tmp
    # cd /mnt/tmp
    # bonnie++ -u root -s 1000 -m sda1
    # cd /
    # umount /mnt/tmp

RAID setup/test example

    # cfdisk /dev/hde # make partitions
    # ed /etc/raidtab # see http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/
    # mkraid /dev/md0
    # cat /proc/mdstat
    # mkreiserfs /dev/md0
    # mount -t reiserfs /dev/md0 /mnt/tmp
    # cd /mnt/tmp
    # bonnie++ -u root -s 1000 -m md0
    # cd /
    # umount /mnt/tmp
    # raidstop /dev/md0

Performance

JBOD tests have a simultaneous bonnie++ process for each drive, each drive has its own ext2 file system.  For comparison and interest, I also measured RAID0, RAID1 and RAID10.  3ware RAID5 initialization takes a long time, around 2 hours for an 8 drive array.

For bonnie++ text results, see ide-3wraid.txt.
 
Configuration total
drives
capacity max-GB/
8-drives
comment
JBOD
n
n
600
test 1 to 8 drives
RAID0 (stripe)
n
n
600
test 2- to 8-drive arrays
RAID1 (mirror)
2
1
300
test 1 to 4 * 2-drive arrays
RAID10 (stripe of mirror)
n
n/2
300
test 4- or 8-drive array
RAID5
n
n-1
525
test 3- to 8-drive arrays

Graphs

For hardware configuration comparison, ext2 JBOD tests provide a pretty good measure of performance.  So my current practice is to use JBOD measurements to select the optimal hardware configuration, then run RAID measurements on just the final hardware configuration.  Other hardware configurations are added on the basis of interest for comparison.

Performance may be affected by PCI slots and interrupts.  Tests with different PCI slots showed similar performance, at least with this motherboard.  The different configurations also had diverse interrupt assignments, so interrupts can probably be overlooked for now.  So PCI slots 1 and 3 were chosen for cabling reasons.  This also leaves PCI slot 5 available for another controller if I can figure out how to mount 8 more drives in the cabinet.

A chunk size of 64 is used uniformly for these tests.

graph of bonnie++ block read performance

Clarification:
    cyan is 3w ext2 RAID10 (circa 1.3MBps)
    yellow is 3w ext2 RAID5
Comments:
    JBOD1 - single controller, looks OK
    JBOD2 - two controllers, additional throughput with second controller, looks OK
    3w ext2 RAID0 - 60MBps read speed is lower than desired with 64KB stripe (consider stripe up to 1MB)
    3w ext2 RAID1 - suggests single disk reads
    3w ext2 RAID10 - 1.3MBps is hobbling (stripe of 1MB had no effect) - something is broken
    3w ext2 RAID5 - like RAID0
    3w reiserfs RAID5 - looks OK
    2.4.2 ext2 RAID0 - looks reasonable
    2.4.2 reiserfs RAID0 - works for 16 drives (> 1TB), but performance drops significantly
    2.4.2 ext2 RAID5 - looks reasonable, some slowdown with 13-15 drives
    2.4.2 reiserfs RAID5 - works for 16 drives (> 1TB), but performance drops unacceptably for 14-16 drives

Something is clearly wrong with 3w RAID10.  One additional note is that 3w RAID1 did drive calibration, but 3w RAID10 did not.

graph of bonnie++ block write performance


Clarification:
    cyan is 3w ext2 RAID10
    yellow is 3w ext RAID5 - circa (11MBps) - nearly identical to 3w reiserfs RAID5
Comments:
    JBOD1 - single controller, looks OK
    JBOD2 - two controllers, 1-7 is lower than expected, 9-16 shows a lot of variation
        (I wish that I had time to run multiple tests to display average and error-bars).
    3w ext2 RAID0 - looks OK (consider stripe up to 1MB)
    3w ext2 RAID1 - 45MBps is disappointing as more mirrored arrays are added
    3w ext2 RAID10 - 35MBps is disappointing, but not hobbling like RAID10 read (stripe of 1MB had no effect)
    3w ext2 RAID5 - 11MBps is terribly disappointing (probably unacceptable)
    3w reiserfs RAID5 - 11MBps RAID5 slowness, otherwise OK, probably CPU bound
    2.4.2 ext2 RAID0 - looks reasonable
    2.4.2 reiserfs RAID 0 - performance drops unacceptably for 9-16 drives
    2.4.2 ext2 RAID5 - slower than desired, but otherwise OK
    2.4.2 reiserfs RAID5 - performance drops unacceptably for 9-16 drives

Performance Summary

key & configuration Bonnie read
MB/sec
Bonnie write
MB/sec
Comment
PIO ex. I34GXP 4.1 4.3 Promise Ultra66
I75GXP I66 36.5 26.9 Intel CC820 ICH
I75GXP P100 36.5 29.4 Promise Ultra100
I75GXP P100 ReiserFS 35.8 35.4 Promise Ultra100
I75GXP 3W
36.6
36.4
3ware Escalade 6800
I34RAID  66.8  35.6 Promise Ultra66 
M40RAID 46.6 35.5 mixed controllers
S18RAID 39.5 36.7 2940U2W W/LW mix
3WRAID 62.5 30.4 3ware Escalade 6800 JBOD (SW RAID5)

Explanation for the above, in order of test:
PIO ex. I34GXP - PIO reference
I75GXP I66 - Intel PIIX4 reference
I75GXP P100 - Promise Ultra100 reference
I34RAID,  M40RAID, S18RAID - reference

bonnie++

    http://www.coker.com.au/bonnie++/ - adds directory benchmarks and synchronized multiple processes
    To enable SystemV Shared Memory (for semaphores), see Documentation/Changes in kernel source
    # mkdir /dev/shm
    # ed /etc/fstab
    none                    /dev/shm                shm     defaults        0 0
    # reboot
    # bonnie -p 2 -u root
    # for i in 1 2; do bonnie -y -u root -s 1000 -m ... & done
 
Sequential Output Sequential Input Random
Per Char Block Rewrite Per Char Block
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
PIO ex. I34GXP 500 2845 64.1 4315 50.5 2053 10.5 2743 32.0 4114 5.4 86.9 2.1
I75GXP I66 500 9602 97.2 26937 17.7 14687 19.8 9727 93.8 36482 21.2 155.8 1.7
I75GXP P100 500 9629 97.5 29428 20.5 15312 19.5 9798 94.9 36462 21.7 158.4 1.7
I75GXP P100 ReiserFS 500 7972 98.2 35377 65.1 15243 24.9 8509 93.0 35762 27.2 148.8 2.5
I34RAID 500 7251 91.9 35571 30.2 18232 35.0 8134 95.9 66774 46.8 207.6 3.0
M40RAID 500 7443 91.3 35546 29.5 17707 34.0 8251 95.4 46554 32.6 322.3 4.4
S18RAID 500 4857 98.3 39451 78.8 16078 55.2 6533 95.0 36652 35.6 495.8 11.8
3WRAID 4000 11770 85 30398 13 21990 20 11050 82 62470 49 245.1 1


NoBell Home - gjm - last update 3/13/2001