Sexy (ZFS on OpenSolaris)

Discussion in 'Linux & BSD' started by X-Istence, Dec 8, 2008.

  1. X-Istence

    X-Istence * Political User

    Messages:
    6,498
    Location:
    USA
    Code:
    uname -a
    SunOS Keyhole.network.lan 5.11 snv_101b i86pc i386 i86pc Solaris
    Code:
    zfs list
    NAME                         USED  AVAIL  REFER  MOUNTPOINT
    rpool                       6.27G  48.4G    72K  /rpool
    rpool/ROOT                  2.52G  48.4G    18K  legacy
    rpool/ROOT/opensolaris      2.52G  48.4G  2.39G  /
    rpool/dump                  1.87G  48.4G  1.87G  -
    rpool/export                2.28M  48.4G    19K  /export
    rpool/export/home           2.26M  48.4G    21K  /export/home
    rpool/export/home/guest     22.5K  10.0G  22.5K  /export/home/guest
    rpool/export/home/xistence  2.22M  48.4G  2.22M  /export/home/xistence
    rpool/swap                  1.87G  50.2G    16K  -
    storage                      289G  3.28T  33.6K  /storage
    storage/media                246G  3.28T   246G  /storage/media
    storage/virtualbox          28.8K  3.28T  28.8K  /storage/virtualbox
    storage/xistence            42.7G  3.28T  42.7G  /storage/xistence
    
    Code:
    zpool list
    NAME      SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
    rpool    55.5G  4.40G  51.1G     7%  ONLINE  -
    storage  4.53T   361G  4.18T     7%  ONLINE  -
    
    Code:
    zpool status
      pool: rpool
     state: ONLINE
     scrub: none requested
    config:
    
            NAME        STATE     READ WRITE CKSUM
            rpool       ONLINE       0     0     0
              c4d0s0    ONLINE       0     0     0
    
    errors: No known data errors
    
      pool: storage
     state: ONLINE
     scrub: none requested
    config:
    
            NAME        STATE     READ WRITE CKSUM
            storage     ONLINE       0     0     0
              raidz1    ONLINE       0     0     0
                c5d0    ONLINE       0     0     0
                c5d1    ONLINE       0     0     0
                c6d0    ONLINE       0     0     0
                c6d1    ONLINE       0     0     0
                c7d0    ONLINE       0     0     0
    
    errors: No known data errors
    Drives in the machine:

    1 x 60 GB Seagate (soon will have a second 60 GB Seagate to make the rpool a mirror)

    5 x 1 TB Seagate Barracuda (storage pool, raidz1)

    Raidz is basically 4 storage drive and 1 parity. I can lose a single drive out of the 5 installed and all of my data will still be safe. I will soon hopefully be getting some more money so I can add a sixth drive as the hot spare.

    Copying data to it, I was unable to max out writing to it. My laptop internal hard drive does a maximum of 25 MB/sec, my external firewire drive maxes out at 30 MB/sec. Reading the data and then encrypting it over SSH to send with rsync sent my outgoing speed over my GigBit lan to 55 MB/sec, that caused reads from my internal drive and external drive to be maxed out, and my CPU to be maxed out encrypting the data to be sent over SSH. The server hardly hiccuped. I was watching zpool iostat, and every so often it would do writes out to the raidz1 at almost 200 MB/sec. Split between the 5 drives that equals to 40 MB/sec so I did not even max out the individual drives in terms of write speeds. Not sure what my kinda of cheap motherboard max transfer speed for SATA is when writing to all the devices on the SATA bus at the same time :p.
     
  2. LordOfLA

    LordOfLA Godlike!

    Messages:
    7,027
    Location:
    Maidenhead, Berkshire, UK
    I was able to do RAID0 Vista to RAID0 Linux/ext3 at 120MB/sec :)

    SATA is SATA so all SATA controllers can support 300MB/sec transfers. Its the disks that can't go that fast :p
     
  3. osnnraptor

    osnnraptor OSNN One Post Wonder

    Messages:
    6
    can you provide some raw numbers - coping big file from the 60 to the raid, etc
    I'll need a good fs for storing many big files, rarely creating new ones. Tried xfs, jfs, ext3, ext4, btrfs etc.
     
  4. Dark Atheist

    Dark Atheist Moderator Political User Folding Team

    Messages:
    6,376
    Location:
    In The Void
    zfs requires a 64bit os - you can use it on 32bit but its unstable. zfs is a 128bit fs

    xfs/jfs are the ones you wont if your on a linux setup, yes they are slow when deleting files but not eveyrthing is perfect.

    reiserfs4 - you want to stay away from that, for obvious reasons ;)
     
  5. LordOfLA

    LordOfLA Godlike!

    Messages:
    7,027
    Location:
    Maidenhead, Berkshire, UK
    I found ext3 was fastest for storing and reading large files on software raid0 from a windows source. Peculiarities in the way windows writes to network shares show up on the other systems in vastly reduced write speed as they don't account for windows remote write pattern.
     
  6. Dark Atheist

    Dark Atheist Moderator Political User Folding Team

    Messages:
    6,376
    Location:
    In The Void
    i still use ext2 :)
     
  7. LordOfLA

    LordOfLA Godlike!

    Messages:
    7,027
    Location:
    Maidenhead, Berkshire, UK
    well you should use 3 if you want better data safety.
     
  8. Dark Atheist

    Dark Atheist Moderator Political User Folding Team

    Messages:
    6,376
    Location:
    In The Void
    if the journal goes south so does all your stuff :p and i have seen that happen, luckily it wasn't my lvm it happened on :D
     
  9. LordOfLA

    LordOfLA Godlike!

    Messages:
    7,027
    Location:
    Maidenhead, Berkshire, UK
    if the journal goes south its just the journal.

    If your other data went with it that was an error somewhere else.
     
  10. Dark Atheist

    Dark Atheist Moderator Political User Folding Team

    Messages:
    6,376
    Location:
    In The Void
    wasn't mine :)
     
  11. X-Istence

    X-Istence * Political User

    Messages:
    6,498
    Location:
    USA
    Copying from the 60 GB to the raidz won't give you the numbers you are looking. The 60 GB is an IDE ultra 100 drive, the speed won't be nearly as good.

    What kind of numbers are you looking for? How big are these big files you are talking about? I have dd images of hard drives that are 120 GB each sitting on a ZFS drive (different machine, not raidz'ed either)

    For example:

    Creating a 4 GB file using dd

    Code:
    # time dd if=/dev/zero of=test bs=1024k count=4096
    4096+0 records in
    4096+0 records out
    
    real    0m33.773s
    user    0m0.009s
    sys     0m3.584s

    (4 gigabytes) / (33.7 seconds) = 121.543027 MBps

    I also did one where I had two dd's running at the same time:

    Code:
    # time bash foo.sh 
    4096+0 records in
    4096+0 records out
    4096+0 records in
    4096+0 records out
    
    real    1m9.795s
    user    0m0.029s
    sys     0m10.143s
    
    # cat foo.sh 
    dd if=/dev/zero of=test bs=1024k count=4096 &
    dd if=/dev/zero of=test1 bs=1024k count=4096 &
    wait
    
    (8 gigabytes) / (69.7 seconds) = 117.532281 MBps

    I also did a read test from the raidz using the native block size (128k):

    Code:
    time dd if=test of=/dev/null bs=128k
    32768+0 records in
    32768+0 records out
    
    real    0m47.422s
    user    0m0.047s
    sys     0m3.310s
    (4 gigabytes) / (47.4 seconds) = 86.4135021 MBps

    Some notes:

    Unfortunately OpenSolaris does not yet have support for the AHCI that is on the motherboard, when I tested it with FreeBSD there were some issues as well, so the SATA is currently in IDE compatibility mode, which unfortunately means that two drives share the same "bus" much like two IDE drives share the same bus.

    I am using standard off the shelf components for this build. Nothing to special about the hardware involved. I am more than happy with the performance, and seeing how this server is for backups I care more about data integrity than I do about read/write speeds.

    Another note, dd while providing a simple tool to do tests, will more than likely not max out read/write speeds to drives and should be used as a small example online. There are other tests to run for file system performance, like iozone, however I don't want to run those on a "production" server.

    I have a FreeBSD virtual machine running on top of OpenSolaris (using VirtualBox running Headless) and this is the results I get when I run dd from within the VM:

    Code:
    dd if=/dev/zero of=test bs=1024k count=1024
    1024+0 records in
    1024+0 records out
    1073741824 bytes transferred in 35.006241 secs (30672869 bytes/sec)
    30 672 869 (bytes / sec) = 29.2519274 MB / sec

    Which is not unlike what I get on my Mac OS X laptop in terms of write speed.
     
  12. X-Istence

    X-Istence * Political User

    Messages:
    6,498
    Location:
    USA
    Code:
    zpool status
      pool: rpool
     state: ONLINE
     scrub: none requested
    config:
    
            NAME        STATE     READ WRITE CKSUM
            rpool       ONLINE       0     0     0
              mirror    ONLINE       0     0     0
                c4d0s0  ONLINE       0     0     0
                c4d1s0  ONLINE       0     0     0
    
    errors: No known data errors
    
      pool: storage
     state: ONLINE
     scrub: none requested
    config:
    
            NAME        STATE     READ WRITE CKSUM
            storage     ONLINE       0     0     0
              raidz1    ONLINE       0     0     0
                c5d0    ONLINE       0     0     0
                c5d1    ONLINE       0     0     0
                c6d0    ONLINE       0     0     0
                c6d1    ONLINE       0     0     0
                c7d0    ONLINE       0     0     0
    
    errors: No known data errors
    
    Added an extra drive to my rpool, now it is mirrored. The cool thing is, you can even create mirrors with drives that are not the same size. Right now it is a 60 GB drive and a 80 GB drive. The mirror is a complete copy which it is possible to boot from, meaning my OS disk is now failure proof, or at least that is the point of the exercise. I will create a new thread soon detailing what to do to add a new drive to a root pool.
     
  13. misba

    misba OSNN One Post Wonder

    Messages:
    1
    The problem with btrfs is that it will take at least 2-3 years before it gets anywhere near production ... thats a bit too late for most people depending on computers storing the data safe.
    ------------------
    Misbah
     
    Last edited by a moderator: Jan 14, 2009