ZFS

From Athenaeum
Revision as of 12:51, 27 December 2018 by James8562 (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

General Info

Random

ZPool Email Alerts here Dedup Info here
Dedup and Block Size discussion
ZFS do's and dont's here
Info about ACLs can be found here.
How to edit loader.conf here
ashift=9 vs ashift=12 here
Zpool Calculator here
DTrace Scripts here


  • When determining how many disks to use in a RAIDZ, the following configurations provide optimal performance. Array sizes beyond 12 disks are not recommended.
Start a RAIDZ1 at at 3, 5, or 9, disks.
Start a RAIDZ2 at 4, 6, or 10 disks.
Start a RAIDZ3 at 5, 7, or 11 disks.
  • The recommended number of disks per vdev is between 3 and 9. If you have more disks, use multiple pools or vdevs.


  • In order to quickly prepare a drive for use you can erase the beginning and the end with the following commands. Pick whichever one is appropriate.
dd bs=512 if=/dev/zero of=/dev/disk/by-id/ata-ST3000DM001-1ER166_W500QFNT count=2048
dd bs=512 if=/dev/zero of=/dev/sda count=2048
dd bs=512 if=/dev/zero of=/dev/disk/by-id/ata-ST3000DM001-1ER166_W500QFNT count=2048 seek=$((`blockdev --getsz /dev/disk/by-id/ata-ST3000DM001-1ER166_W500QFNT` - 2048))
dd bs=512 if=/dev/zero of=/dev/sda count=2048 seek=$((`blockdev --getsz /dev/sda` - 2048))

Dedup

  1. There are some resources that suggest that one needs 2GB per TB of storage with deduplication [i] (in fact this is a misinterpretation of the text). In practice with FreeBSD, based on empirical testing and additional reading, it's closer to 5GB per TB.
  2. Using deduplication is slower than not running it.

Deduplicated space accounting is reported at the pool level. You must use the zpool list command rather than the zfs list command to identify disk space consumption when dedup is enabled.

  • To see the dedup stats use: zdb -DD {pool name}

ARC

  • L2ARC stresses out the ARC and so you shouldn't exceed a 4:1 ratio of L2ARC:ARC.

In order to monitor arc stats under linux you'll first need to install the linux version of the kstat perl module. Get that here.
Then you'll need to download the arcstat perl script. Get that here.

./arcstat.pl -f read,hits,miss,hit%,l2read,l2hits,l2miss,l2hit%,arcsz,l2size 1 100

Commands

  • To replace a failed drive in a vdev:

1. Offline the drive by its GUID. Using the GUID works even if the drive has already been removed, which would stop udev from mapping it and therefore invalidate its /dev entry in the zpool meta.

zdb
zpool offline Tank 4954173666313093383

2. Then execute the replace command

zpool replace -f Tank 4954173666313093383 /dev/disk/by-id/ata-ST3000DM001-1ER166_W500QFNT

  • Zpool Status
If the drives are in a redundant configuration the only time you have to worry about read, write, cksum errors is when they are at the pool level. If status is showing errors at the device level you may have a faulty device but ZFS was able to recover the data/metadata. You should clear the errors and scrub the pool immediately to see if they return.

  • Storage Info: zfs get used,lrefer,lused,compressratio
  • Config Info (FreeNAS requires -U and path): zdb -U /data/zfs/zpool.cache -C Media_Store

Performance Tuning

  • Relate disk-throughput with network-usage (eth0), total CPU-usage and system counters
dstat -dnyc -N eth0 -C total -f 5
  • A good rule of thumb for dedup is 16gb per 1tb + 16gb for ZFS caching in general.
  • Number of drives vs IOPS
The Write-IOPS that a single vdev can produce is limited to the IOPS of the single slowest drive in the vdev. This is because each member of a vdev must sign off on any changes made to any files on that vdev, and the operation is performed individually. If you want more performance, add additional vdevs to the zpool, rather than making a single larger vdev. For example: you have 6x2TB drives. in a single RAID-Z2 vdev, you have 12TB zpool, 8TB space. You could instead create 2 RAID-Z1 vdevs: 3x2TB each, leaving you the same 8TB of usable space out of your 12TB zpool, one parity disk per vdev. However, in this configuration, you would have 2x the IOPS, as IOPS are the aggregate of the single slowest drive from each vdev.

Tuning for 16GB of ram.
Site that has various performance tuning steps.
ZIL - To Mirror or Not to Mirror.
SWAP and ZFS - Why you should use swap with ZFS.

Maintenance

  • Add this to your cron.d folder to keep your data healthy.
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# Scrub the second Sunday of every month.
24 0 8-14 * * root [ $(date +\%w) -eq 0 ] && [ -x /usr/lib/zfs-linux/scrub ] && /usr/lib/zfs-linux/scrub

Advanced Format Drives

1. Add the disks

2. Format as ZFS

3. Create virtual devices, choosing advanced format (this will create the pool, but with ad#.nop as the members)

If you run zdb | grep ashift at this point, you should get ashift=12. However, zfs is still working through GEOM to access the drives (unnecessary overhead). Now, before you actually write any data to the drive, do the following:

4. zpool export {poolname}

5. From a command prompt

gnop destroy /dev/ad0.nop /dev/ad1.nop (assuming ad0 and ad1 are your devices. Mine are ad4 and ad6)

6. zpool import {poolname}

7. When you type zpool status, you should see your pools have the physical devices listed as members, not the GEOM logical ones.

The GEOM devices will be created at boot time still, but it won't matter, because zfs is dealing directly with the drive, rather than the gnop layer. You can also verify that this whole shebang will work by interrogating the drives for the features. When you add the disks to the management interface, ensure you activate SMART monitoring. Then, you ask about the drive by issuing the following command:

smartctl -i /dev/ad0 (or whatever ad # you have)

The crucial part is: Sector Sizes: 512 bytes logical, 4096 bytes physical. That's a clue that this method will actually do some good. Following the pool's export and subsequent import, it would be wise to carry out a scrub before you put data on it, and re-issue the zdb command to double check the alignment.