Sexy (ZFS on OpenSolaris)

X-Istence

*
Political Access
Joined
5 Dec 2001
Messages
6,498
Code:
uname -a
SunOS Keyhole.network.lan 5.11 snv_101b i86pc i386 i86pc Solaris

Code:
zfs list
NAME                         USED  AVAIL  REFER  MOUNTPOINT
rpool                       6.27G  48.4G    72K  /rpool
rpool/ROOT                  2.52G  48.4G    18K  legacy
rpool/ROOT/opensolaris      2.52G  48.4G  2.39G  /
rpool/dump                  1.87G  48.4G  1.87G  -
rpool/export                2.28M  48.4G    19K  /export
rpool/export/home           2.26M  48.4G    21K  /export/home
rpool/export/home/guest     22.5K  10.0G  22.5K  /export/home/guest
rpool/export/home/xistence  2.22M  48.4G  2.22M  /export/home/xistence
rpool/swap                  1.87G  50.2G    16K  -
storage                      289G  3.28T  33.6K  /storage
storage/media                246G  3.28T   246G  /storage/media
storage/virtualbox          28.8K  3.28T  28.8K  /storage/virtualbox
storage/xistence            42.7G  3.28T  42.7G  /storage/xistence

Code:
zpool list
NAME      SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
rpool    55.5G  4.40G  51.1G     7%  ONLINE  -
storage  4.53T   361G  4.18T     7%  ONLINE  -

Code:
zpool status
  pool: rpool
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          c4d0s0    ONLINE       0     0     0

errors: No known data errors

  pool: storage
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c5d0    ONLINE       0     0     0
            c5d1    ONLINE       0     0     0
            c6d0    ONLINE       0     0     0
            c6d1    ONLINE       0     0     0
            c7d0    ONLINE       0     0     0

errors: No known data errors

Drives in the machine:

1 x 60 GB Seagate (soon will have a second 60 GB Seagate to make the rpool a mirror)

5 x 1 TB Seagate Barracuda (storage pool, raidz1)

Raidz is basically 4 storage drive and 1 parity. I can lose a single drive out of the 5 installed and all of my data will still be safe. I will soon hopefully be getting some more money so I can add a sixth drive as the hot spare.

Copying data to it, I was unable to max out writing to it. My laptop internal hard drive does a maximum of 25 MB/sec, my external firewire drive maxes out at 30 MB/sec. Reading the data and then encrypting it over SSH to send with rsync sent my outgoing speed over my GigBit lan to 55 MB/sec, that caused reads from my internal drive and external drive to be maxed out, and my CPU to be maxed out encrypting the data to be sent over SSH. The server hardly hiccuped. I was watching zpool iostat, and every so often it would do writes out to the raidz1 at almost 200 MB/sec. Split between the 5 drives that equals to 40 MB/sec so I did not even max out the individual drives in terms of write speeds. Not sure what my kinda of cheap motherboard max transfer speed for SATA is when writing to all the devices on the SATA bus at the same time :p.
 
I was able to do RAID0 Vista to RAID0 Linux/ext3 at 120MB/sec :)

SATA is SATA so all SATA controllers can support 300MB/sec transfers. Its the disks that can't go that fast :p
 
can you provide some raw numbers - coping big file from the 60 to the raid, etc
I'll need a good fs for storing many big files, rarely creating new ones. Tried xfs, jfs, ext3, ext4, btrfs etc.
 
zfs requires a 64bit os - you can use it on 32bit but its unstable. zfs is a 128bit fs

xfs/jfs are the ones you wont if your on a linux setup, yes they are slow when deleting files but not eveyrthing is perfect.

reiserfs4 - you want to stay away from that, for obvious reasons ;)
 
I found ext3 was fastest for storing and reading large files on software raid0 from a windows source. Peculiarities in the way windows writes to network shares show up on the other systems in vastly reduced write speed as they don't account for windows remote write pattern.
 
if the journal goes south so does all your stuff :p and i have seen that happen, luckily it wasn't my lvm it happened on :D
 
if the journal goes south its just the journal.

If your other data went with it that was an error somewhere else.
 
can you provide some raw numbers - coping big file from the 60 to the raid, etc
I'll need a good fs for storing many big files, rarely creating new ones. Tried xfs, jfs, ext3, ext4, btrfs etc.

Copying from the 60 GB to the raidz won't give you the numbers you are looking. The 60 GB is an IDE ultra 100 drive, the speed won't be nearly as good.

What kind of numbers are you looking for? How big are these big files you are talking about? I have dd images of hard drives that are 120 GB each sitting on a ZFS drive (different machine, not raidz'ed either)

For example:

Creating a 4 GB file using dd

Code:
# time dd if=/dev/zero of=test bs=1024k count=4096
4096+0 records in
4096+0 records out

real    0m33.773s
user    0m0.009s
sys     0m3.584s


(4 gigabytes) / (33.7 seconds) = 121.543027 MBps

I also did one where I had two dd's running at the same time:

Code:
# time bash foo.sh 
4096+0 records in
4096+0 records out
4096+0 records in
4096+0 records out

real    1m9.795s
user    0m0.029s
sys     0m10.143s

# cat foo.sh 
dd if=/dev/zero of=test bs=1024k count=4096 &
dd if=/dev/zero of=test1 bs=1024k count=4096 &
wait

(8 gigabytes) / (69.7 seconds) = 117.532281 MBps

I also did a read test from the raidz using the native block size (128k):

Code:
time dd if=test of=/dev/null bs=128k
32768+0 records in
32768+0 records out

real    0m47.422s
user    0m0.047s
sys     0m3.310s

(4 gigabytes) / (47.4 seconds) = 86.4135021 MBps

Some notes:

Unfortunately OpenSolaris does not yet have support for the AHCI that is on the motherboard, when I tested it with FreeBSD there were some issues as well, so the SATA is currently in IDE compatibility mode, which unfortunately means that two drives share the same "bus" much like two IDE drives share the same bus.

I am using standard off the shelf components for this build. Nothing to special about the hardware involved. I am more than happy with the performance, and seeing how this server is for backups I care more about data integrity than I do about read/write speeds.

Another note, dd while providing a simple tool to do tests, will more than likely not max out read/write speeds to drives and should be used as a small example online. There are other tests to run for file system performance, like iozone, however I don't want to run those on a "production" server.

I have a FreeBSD virtual machine running on top of OpenSolaris (using VirtualBox running Headless) and this is the results I get when I run dd from within the VM:

Code:
dd if=/dev/zero of=test bs=1024k count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 35.006241 secs (30672869 bytes/sec)

30 672 869 (bytes / sec) = 29.2519274 MB / sec

Which is not unlike what I get on my Mac OS X laptop in terms of write speed.
 
Code:
zpool status
  pool: rpool
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c4d0s0  ONLINE       0     0     0
            c4d1s0  ONLINE       0     0     0

errors: No known data errors

  pool: storage
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c5d0    ONLINE       0     0     0
            c5d1    ONLINE       0     0     0
            c6d0    ONLINE       0     0     0
            c6d1    ONLINE       0     0     0
            c7d0    ONLINE       0     0     0

errors: No known data errors

Added an extra drive to my rpool, now it is mirrored. The cool thing is, you can even create mirrors with drives that are not the same size. Right now it is a 60 GB drive and a 80 GB drive. The mirror is a complete copy which it is possible to boot from, meaning my OS disk is now failure proof, or at least that is the point of the exercise. I will create a new thread soon detailing what to do to add a new drive to a root pool.
 
The problem with btrfs is that it will take at least 2-3 years before it gets anywhere near production ... thats a bit too late for most people depending on computers storing the data safe.
------------------
Misbah
 
Last edited by a moderator:

Members online

No members online now.

Latest profile posts

Also Hi EP and people. I found this place again while looking through a oooollllllldddd backup. I have filled over 10TB and was looking at my collection of antiques. Any bids on the 500Mhz Win 95 fix?
Any of the SP crew still out there?
Xie wrote on Electronic Punk's profile.
Impressed you have kept this alive this long EP! So many sites have come and gone. :(

Just did some crude math and I apparently joined almost 18yrs ago, how is that possible???
hello peeps... is been some time since i last came here.
Electronic Punk wrote on Sazar's profile.
Rest in peace my friend, been trying to find you and finally did in the worst way imaginable.

Forum statistics

Threads
62,015
Messages
673,494
Members
5,621
Latest member
naeemsafi
Back