Difference between pages "Funtoo Linux Installation" and "ZFS Fun"

From Funtoo
(Difference between pages)
Jump to: navigation, search
m (fixed sysresccd links)
 
(The time travelling machine or dealing with multiple snapshots)
 
Line 1: Line 1:
== Introduction ==
+
{{Fancyimportant|This tutorial is under a heavy revision to be switched from ZFS Fuse to ZFS on Linux.}}
  
This document was written to help you install Funtoo Linux as concisely as possible, with a minimum number of distracting options regarding system configuration.
+
= Introduction =
  
These docs assume you have a "PC compatible" computer system with a standard PC BIOS. Many new computers support UEFI for booting, which is a new firmware interface that replaces the old-fashioned MBR-based BIOS. If you have a system with UEFI, you will want to use this documentation along with the [[UEFI Install Guide]], which will augment these instructions and explain how to get your system to boot. You may need to change your PC BIOS settings to enable or disable UEFI booting. The [[UEFI Install Guide]] has more information on this, and steps on how to determine if your system supports UEFI.
+
== ZFS features and limitations ==
  
We also offer a [[ZFS Install Guide]], which augment the instructions on this page for those who want to install Funtoo Linux on ZFS. If you are installing Funtoo Linux on [[Funtoo Linux Installation on ARM|ARM]] architecture, please see [[Funtoo Linux Installation on ARM]] for notable differences regarding ARM support. An experimental Funtoo Linux build also exists for [[Funtoo Linux Installation on SPARC|SPARC]] platforms. See [[Funtoo Linux Installation on SPARC]].
+
ZFS offers an impressive amount of features even putting aside its hybrid nature (both a filesystem and a volume manager -- zvol) covered in detail on [http://en.wikipedia.org/wiki/ZFS Wikipedia]. One of the most fundamental points to keep in mind about ZFS is it '''targets a legendary reliability in terms of preserving data integrity'''. ZFS uses several techniques to detect and repair (self-healing) corrupted data. Simply speaking it makes an aggressive use of checksums and relies on data redundancy, the price to pay is a bit more CPU processing power. However, the [http://en.wikipedia.org/wiki/ZFS Wikipedia article about ZFS] also mention it is strongly discouraged to use ZFS over classic RAID arrays as it can not control the data redundancy, thus ruining most of its benefits.
  
If you've had previous experience installing Gentoo Linux then a lot of steps will be familiar, but you should still read through as there are a few differences.
+
In short, ZFS has the following features (not exhaustive):
  
== Installation Overview ==
+
* Storage pool dividable in one or more logical storage entities.
 +
* Plenty of space:
 +
** 256 zettabytes per storage pool (2^64 storages pools max in a system).
 +
** 16 exabytes max for a single file
 +
** 2^48 entries max per directory
 +
* Virtual block-devices support support over a ZFS pool (zvol) - (extremely cool when jointly used  over a RAID-Z volume)
 +
* Read-only Snapshot support (it is possible to get a read-write copy of them, those are named clones)
 +
* Encryption support (supported only at ZFS version 30 and upper, ZFS version 31 is shipped with Oracle Solaris 11 so that version is mandatory if you plan to encrypt your ZFS datasets/pools)
 +
* Built-in''' RAID-5-like-over-steroid capabilities known as [http://en.wikipedia.org/wiki/Non-standard_RAID_levels#RAID-Z RAID-Z] and RAID-6-like-over-steroid capabilities known as RAID-Z2'''. RAID-Z3 (triple parity) also exists.
 +
* Copy-on-Write transactional filesystem
 +
* Meta-attributes support (properties) allowing you to you easily drive the show like "That directory is encrypted", "that directory is limited to 5GiB", "That directory is exported via NFS" and so on. Depending on what you define, ZFS takes the appropriates actions!
 +
* Dynamic striping to optimize data throughput
 +
* Variable block length 
 +
* Data deduplication
 +
* Automatic pool re-silvering
 +
* Transparent data compression
 +
* Transparent encryption (Solaris 11 and later only)
  
This is a basic overview of the Funtoo installation process:
+
Most notable limitations are:
  
# [[#Live CD|Download and boot the live CD of your choice]].
+
* Lack a features ZFS developers knows as "Block Pointer rewrite functionality" (planned to be developed), without it ZFS suffers of currently not being able to:
# [[#Prepare Hard Disk|Prepare your disk]].
+
** Pool defragmentation (COW techniques used in ZFS mitigates the problem)
# [[#Creating filesystems|Create]] and [[#Mounting filesystems|mount]] filesystems.
+
** Pool resizing
# [[#Installing the Stage 3 tarball|Install the Funtoo stage tarball]] of your choice.
+
** Data compression (re-applying)
# [[#Chroot into Funtoo|Chroot into your new system]].
+
** Adding an additional device in a RAID-Z/Z2/Z3 pool to increase it size (however, it is possible to replace in sequence each one of the disks composing a RAID-Z/Z2/Z3)
# [[#Downloading the Portage tree|Download the Portage tree]].
+
* '''NOT A CLUSTERED FILESYSTEM''' like Lustre, GFS or OCFS2
# [[#Configuring your system|Configure your system]] and [[#Configuring your network|network]].
+
* No data healing if used on a single device (corruption can still be detected), workaround if to force a data duplication on the drive
# [[#Configuring and installing the Linux kernel|Install a kernel]].
+
* No support of TRIMming (SSD devices)
# [[#Installing a Bootloader|Install a bootloader]].
+
# [[#Finishing Steps|Complete final steps]].
+
# [[#Restart your system|Reboot and enjoy]].
+
  
=== Live CD ===
+
== ZFS on well known operating systems ==
  
Funtoo doesn't provide an "official" Funtoo Live CD, but there are plenty of good ones out there to choose from. A great choice is the Gentoo-based [http://www.sysresccd.org/ System Rescue CD] as it contains lots of tools and utilities and supports both 32-bit and 64-bit systems.
+
=== Linux ===
  
It is also possible to install Funtoo Linux using many other Linux-based live CDs. Generally, any modern bootable Linux live CD or live USB media will work. See [[Requirements|requirements]] for an overview of what the Live Media must provide to allow a problem-free install of Funtoo Linux.
+
Despite the source code of ZFS is open, its license (Sun CDDL) is incompatible with the license governing the Linux kernel (GNU GPL v2) thus preventing its direct integration. However a couple of ports exists, but suffers of maturity and lack of features. As of writing (February 2014) two known implementations exists:
  
To begin a Funtoo Linux installation, download System Rescue CD from:
+
* [http://zfs-fuse.net ZFS-fuse]: a totally userland implementation relying on FUSE. This implementation can now be considered as defunct as of February  2014). The original site of ZFS FUSE seems to have disappeared nevertheless the source code is still available on [http://freecode.com/projects/zfs-fuse http://freecode.com/projects/zfs-fuse]. ZFS FUSE stalled at version 0.7.0 in 2011 and never really evolved since then.
 +
* [http://zfsonlinux.org ZFS on Linux]: a kernel mode implementation of ZFS in kernel mode which supports a lot of NFS features. The implementation is not as complete as it is under Solaris and its siblings like OpenIndiana (e.g. SMB integration is still missing, no encryption support...) but a lot of functionality is there. This is the implementation used for this article. As ZFS on Linux is an out-of-tree Linux kernel implementation, patches must be waited after each Linux kernel release. ZfsOnLinux currently supports zpools version 28.
  
* Main US mirror: [http://ftp.osuosl.org/pub/funtoo/distfiles/sysresccd/ The Oregon State University Open Source Lab]
+
=== Solaris/OpenIndiana ===
* Main EU mirror: [http://ftp.heanet.ie/mirrors/funtoo/distfiles/sysresccd/ HEAnet] or use your preferred live media. Insert it into your disc drive, and boot from it. If using an older version of System Rescue CD, '''be sure to select the <tt>rescue64</tt> kernel at the boot menu if you are installing a 64-bit system'''. By default, System Rescue CD used to boot in 32-bit mode though the latest version attempts to automatically detect 64-bit processors.
+
  
=== Prepare Hard Disk ===
+
* '''Oracle Solaris:''' remains the de facto reference platform for ZFS implementation: ZFS on this platform is now considered as mature and usable on production systems. Solaris 11 uses ZFS even for its "system" pool (aka ''rpool''). A great advantage of this: it is now quite easy to revert the effect of a patch at the condition a snapshot has been taken just before applying it. In the "old good" times of Solaris 10 and before, reverting a patch was possible but could be tricky and complex when possible. ZFS is far from being new in Solaris as it takes its roots in 2005 to be, then, integrated in Solaris 10 6/06 introduced in June 2006.
==== Partitions ====
+
  
Funtoo Linux fully supports traditional MBR partitions, as well as newer GPT/GUID partition formats. Funtoo Linux recommends the use of the GPT partitioning scheme, since it is newer and more flexible. Here are the various trade-offs between each partitioning scheme:
+
* '''[http://openindiana.org OpenIndiana]:''' is based on the Illuminos kernel (a derivative of the now defunct OpenSolaris) which aims to provide absolute binary compatibility with Sun/Oracle Solaris. Worth mentioning that Solaris kernel and the [https://www.illumos.org Illumos kernel] were both sharing the same code base, however, they now follows a different path since Oracle announced the discontinuation of OpenSolaris (August 13th 2010). Like Oracle Solaris, OpenIndiana uses ZFS for its system pool. The illumos kernel ZFS support lags a bit behind Oracle: it  supports zpool version 28 where as Oracle Solaris 11 has zpool version 31 support, data encryption being supported at zpool version 30.
  
===== GPT Partitions =====
+
=== *BSD ===
  
* Newer, preferred format for Linux systems
+
* '''FreeBSD''': ZFS is present in FreeBSD since FreeBSD 7 (zpool version 6) and FreeBSD can boot on a ZFS volume (zfsboot). ZFS support has been vastly enhanced in FreeBSD 8.x (8.2 supports zpool version 15, version 8.3 supports version 28), FreeBSD 9 and FreeBSD 10 (both supports zpool version 28). ZFS in FreeBSD is now considered as fully functional and mature. FreeBSD derivatives such as the popular [http://www.freenas.org FreeNAS] takes befenits of ZFS and integrated it in their tools. In the case of that latter,  it have, for example, supports for zvol though its Web management interface (FreeNAS >= 8.0.1).
* Supports 2 TB+ hard drives for booting
+
* Supports hundreds of partitions per disk of any size
+
* Requires legacy BIOS boot partition (~32 MB) to be created if system does not use EFI
+
* Requires bootloader with support for GPT such as GRUB 2, EXTLINUX, or a patched version of GRUB Legacy
+
  
===== MBR Partitions =====
+
* '''NetBSD''': ZFS has been started to be ported as a GSoC project in 2007 and is present in the NetBSD mainstream since 2009 (zpool version 13).
  
* Legacy, DOS partitioning scheme
+
* '''OpenBSD''': No ZFS support yet and not planned until Oracle changes some policies according to the project FAQ.
* Only 4 primary partitions per disk; after that, you must use "logical" partitions
+
* Does not support 2 TB+ disks for booting
+
* Compatible with certain problematic systems (such as the HP ProBook 4520)
+
* Dual-boot with Windows for BIOS systems (Windows handle GPT only on true EFI systems, whatever version it is)
+
* Multiple boot loader options, e.g. GRUB 2, GRUB Legacy, lilo
+
  
{{fancyimportant|If you plan to use partitions of 2 TB or greater, you ''must'' partition using the GPT/GUID format. Also note that there are small percentage of PCs that will not boot properly with GPT. For these systems, using MBR partitions or a primary drive with an MBR partition may be required in order to boot.}}
+
== ZFS alternatives ==
  
==== Partitioning Using gdisk ====
+
* WAFL seems to have severe limitation [http://unixconsult.org/wafl/ZFS%20vs%20WAFL.html] (document is not dated), also an interesting article lies [http://blogs.netapp.com/dave/2008/12/is-wafl-a-files.html here]
 +
* BTRFS is advancing every week but it still lacks such features like the capability of emulating a virtual block device over a storage pool (zvol) and built-in support for RAID-5/6 is not complete yet (cf. [https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg29169.html Btrfs mailing list]). At date of writing, it is still experimental where as ZFS is used on big production servers. 
 +
* VxFS has also been targeted by comparisons like [http://blogs.oracle.com/dom/entry/zfs_v_vxfs_iozone this one] (a bit [http://www.symantec.com/connect/blogs/suns-comparision-vxfs-and-zfs-scalability-flawed controversial]). VxFS has been known in the industry since 1993 and is known for its legendary flexibility. Symantec acquired VxFS and proposed a basic version (no clustering for example) of it under the same [http://www.symantec.com/enterprise/sfbasic/index.jsp Veritas Storage Foundation Basic]
 +
* An interesting discussion about modern filesystems can be found on [http://www.osnews.com/story/19665/Solaris_Filesystem_Choices OSNews.com]
  
===== Notes Before We Begin =====
+
== ZFS vs BTRFS at a glance ==
 +
Some key features in no particular order of importance between ZFS and BTRFS:
  
These install instructions assume you are installing Funtoo Linux to an empty hard disk using GUID partition tables (GPT). If you are installing Funtoo Linux on a machine where another OS is installed, or there is an existing Linux distribution on your system that you want to keep, then you will need to adapt these instructions to suit your needs.
+
{| class="wikitable"
 +
!Feature||ZFS!!BTRFS!!Remarks
 +
|-
 +
|Transactional filesystem||YES||YES
 +
|-
 +
|Journaling||NO||YES||Not a design flaw, but ZFS is robust ''by design''...  See page 7 of [http://hub.opensolaris.org/bin/download/Community+Group+zfs/docs/zfslast.pdf ''"ZFS The last word on filesystems"''].
 +
|-
 +
|Dividable pool of data storage||YES||YES
 +
|-
 +
|Read-only snapshot support||YES||YES
 +
|-
 +
|Writable snapshot support||YES||YES
 +
|-
 +
|Sending/Receiving a snapshot over the network||YES||YES
 +
|-
 +
|Rollback capabilities||YES||YES||While ZFS knows where and how to rollback the data (on-line), BTRFS requires a bit more work from the system administrator (off-line).
 +
|-
 +
|Virtual block-device emulation||YES||NO
 +
|-
 +
|Data deduplication||YES||YES||Built-in in ZFS, third party tool ([https://github.com/g2p/bedup bedup]) in BTRFS
 +
|-
 +
|Data blocks reoptimization||NO||YES||ZFS is missing a "block pointer rewrite functionality", true on all known implementations so far. Not a major performance crippling however. BTRFS can do on-line data defragmentation.
 +
|-
 +
|Built-in data redundancy support||YES||YES||ZFS has a sort of RAID-5/6 (but better! RAID-Z{1,2,3}) capability, BTRFS only fully supports data mirroring at this point, however some works remains to be done on parity bits handling by BTRFS.
 +
|-
 +
|Management by attributes||YES||NO||Nearly everything touching ZFS management is related to attributes manipulation (quotas, sharing over NFS, encryption, compression...), BTRFS also retain the concept but it les less aggressively used.
 +
|-
 +
|Production quality code||NO||NO||ZFS support in Linux is not considered as production quality (yet) although it is very robust. Several operating systems like Solaris/OpenIndiana have a production quality implementation, Solaris/OpenIndiana is now installed in ZFS datasets by defaults.
 +
|-
 +
|Integrated within the Linux kernel tree||NO||YES||ZFS is released under the CDDL license...
 +
|}
  
If you are going to create a legacy MBR partition table instead of GUID/GPT, you will use the <tt>fdisk</tt> command instead of <tt>gdisk</tt>, and you will not need to create the GRUB boot loader partition. See the table under [[#Partitioning Recommendations|Partitioning Recommendations]], in particular the
+
= ZFS resource naming restrictions =
'''MBR Block Device (<tt>fdisk</tt>)''' and '''MBR Code''' columns. <tt>fdisk</tt> works just like <tt>gdisk</tt>, but creates legacy MBR partition tables instead of the newer GPT/GUID partition tables.
+
  
Advanced users may be interested in the following topics:
+
Before going further, you must be aware of restrictions concerning the names you can use on a ZFS filesystem. The general rule is: you can can use all of the alphanumeric characters plus the following specials are allowed:
 +
* Underscore (_)
 +
* Hyphen (-)
 +
* Colon (:)
 +
* Period (.)
  
* [[GUID Booting Guide]]
+
The name used to designate a ZFS pool has no particular restriction except:
* [[Rootfs over encrypted lvm]]
+
* it can't use one the reserved words in particular:
* [[Rootfs over encrypted lvm over raid-1 on GPT]]
+
** ''mirror''
* '''NEW!''' '''[[ZFS Install Guide]] (Also contains instructions for Rootfs over Encrypted ZFS!)'''
+
** ''raidz'' (''raidz2'', ''raidz3'' and so on)
 +
** ''spare''
 +
** ''cache''
 +
** ''log''
 +
* names must begin with an alphanumeric character (same for ZFS datasets).
  
===== Using gdisk =====
+
= Some ZFS concepts =
 +
Once again with no particular order of importance:
 +
{|class="wikitable"
 +
|-
 +
!ZFS!!What it is!!Counterparts examples
 +
|-
 +
|zpool||A  group of one or many physical storage media (hard drive partition, file...). A zpool has to be divided in at least one '''ZFS dataset''' or at least one '''zvol''' to hold any data. Several zpools can coexists in a system at the condition they each hold a unique name. Also note that '''zpools can never be mounted, the only things that can are the ZFS datasets they hold.'''||
 +
* Volume group (VG) in LVM
 +
* BTRFS volumes
 +
|-
 +
|dataset||A logical subdivision of a zpool mounted in your host's VFS where your files and directories resides. Several uniquely named ZFS datasets can coexist in a single system at the conditions they each own a unique name within their zpool.||
 +
* Logical subvolumes (LV) in LVM formatted with a filesystem like ext3.
 +
* BTRFS subvolumes
 +
|-
 +
|snapshot||A read-only photo of a ZFS dataset state as is taken at a precise moment of time. ZFS has no way to cooperate on its own with applications that read and write data on ZFS datasets, if those latter still hold data at the moment the snapshot is taken, only what has been flushed will be included in the snapshot. Worth mentioning that snapshot do not take diskspace aside of sone metadata at the exact time they are created, they size will grow as more and data blocks (i.e. files) are deleted or changed on their corresponding live ZFS dataset.||
 +
* No direct equivalent in LVM.
 +
* BTRFS read-only snapshots
 +
|-
 +
|clone||What is is... A writable physical clone of snapshot||
 +
* LVM snapshots
 +
* BTRFS snapshots
 +
|-
 +
|zvol||An emulated block device whose data is hold behind the scene in the zpool the zvol has been created in.||No known equivalent even in BTRFS
 +
|-
 +
|}
  
The first step after booting SystemRescueCd is to use <tt>gdisk</tt> to create GPT (also known as GUID) partitions, specifying the disk you want to use, which is typically <tt>/dev/sda</tt>, the first disk in the system:
+
= Your first contact with ZFS  =
 +
== Requirements ==
 +
* ZFS userland tools installed (package ''sys-fs/zfs'')
 +
* ZFS kernel modules built and installed (package ''sys-fs/zfs-kmod''), there is a known issue with kernel 3.13 series see [http://forums.funtoo.org/viewtopic.php?id=2442 this thread on Funtoo's forum]
 +
* Disk size of 64 Mbytes as a bare minimum (128 Mbytes is the minimum size of a pool). Multiple disk will be simulated through the use of several raw images accessed via the Linux loopback devices.
 +
* At least 512 MB of RAM
  
<console># ##i##gdisk /dev/sda</console>
+
== Preparing ==
You should find <tt>gdisk</tt> very similar to <tt>fdisk</tt>. Here is the partition table we want to end up with:
+
Once your have emerged ''sys-fs/zfs'' and ''sys-fs/zfs-kmod'' you have two options to start using ZFS at this point :
 +
* Either you start ''/etc/init.d/zfs'' (will load all of the zfs kernel modules for you plus a couple of other things)
 +
* Either you load the zfs kernel modules by hand (will load all of the zfs kernel modules for you)
  
<console>Command (? for help): ##i##p
+
So :
Disk /dev/sda: 234441648 sectors, 111.8 GiB
+
<console>###i## rc-service zfs start</console>
Logical sector size: 512 bytes
+
Disk identifier (GUID): A4E5208A-CED3-4263-BB25-7147DC426931
+
Partition table holds up to 128 entries
+
First usable sector is 34, last usable sector is 234441614
+
Partitions will be aligned on 2048-sector boundaries
+
Total free space is 2014 sectors (1007.0 KiB)
+
  
Number  Start (sector)    End (sector) Size      Code  Name
+
Or:
  1           2048          206847  500.0 MiB  8300 Linux filesystem
+
<console>
  2          206848          272383  32.0 MiB    EF02 BIOS boot partition
+
###i## modprobe zfs
  3          272384        8660991  4.0 GiB    8200 Linux swap
+
###i## lsmod | grep zfs
  4        8660992      234441614  107.7 GiB  8300 Linux filesystem
+
zfs                  874072 0
 +
zunicode              328120  1 zfs
 +
zavl                  12997 1 zfs
 +
zcommon                35739 1 zfs
 +
znvpair                48570 2 zfs,zcommon
 +
spl                    58011 5 zfs,zavl,zunicode,zcommon,znvpair
 +
</console>
  
Command (? for help): </console>
+
== Your first ZFS pool ==
 +
To start with, four raw disks (2 GB each) are created:
  
Above, you'll see that we have a 500 MiB boot partition, a 32 MiB "BIOS boot partition" (also known as the GRUB boot loader partition), 4 GiB of swap, and the remaining disk used by a 107.7 GiB root partition.
+
<console>
 +
###i## for i in 0 1 2 3; do dd if=/dev/zero of=/tmp/zfs-test-disk0${i}.img bs=2G count=1; done
 +
0+1 records in
 +
0+1 records out
 +
2147479552 bytes (2.1 GB) copied, 40.3722 s, 53.2 MB/s
 +
...
 +
</console>
  
===== For new <tt>gdisk</tt> users =====
+
Then let's see what loopback devices are in use and which is the first free:
  
These partitions were created using the "<tt>n</tt>" command from within <tt>gdisk</tt>. The <tt>gdisk</tt> commands to create the partition table above are as follows. Adapt sizes as necessary, although these defaults will work for most users. The partition codes entered below can be found in the [[#Partitioning Recommendations|Partitioning Recommendations]] table below, in the GPT Code column.
+
<console>
 +
###i## losetup -a
 +
###i## losetup -f
 +
/dev/loop0
 +
</console>
  
Within <tt>gdisk</tt>, follow these steps:
+
In the above example nothing is used and the first available loopback device is /dev/loop0. Now associate all of the disks with a loopback device (/tmp/zfs-test-disk00.img -> /dev/loop/0, /tmp/zfs-test-disk01.img -> /dev/loop/1 and so on):
  
'''Create a new empty partition table''' (This ''will'' erase all data on the disk when saved):
+
<console>
 +
###i## for i in 0 1 2 3; do losetup /dev/loop${i} /tmp/zfs-test-disk0${i}.img; done
 +
###i## losetup -a
 +
/dev/loop0: [000c]:781455 (/tmp/zfs-test-disk00.img)
 +
/dev/loop1: [000c]:806903 (/tmp/zfs-test-disk01.img)
 +
/dev/loop2: [000c]:807274 (/tmp/zfs-test-disk02.img)
 +
/dev/loop3: [000c]:781298 (/tmp/zfs-test-disk03.img)
 +
</console>
 +
 
 +
{{Fancynote|ZFS literature often names zpools "tank", this is not a requirement you can use whatever name of you choice (as we did here...) }}
 +
 
 +
Every story in ZFS takes its root with a the very first ZFS related command you will be in touch with: '''zpool'''. '''zpool''' as you might guessed manages all ZFS aspects in connection with the physical devices underlying your ZFS storage spaces and the very first task is to use this command to make what is called a ''pool'' (if you have used LVM before, volume groups can be seen as a counter part). Basically what you will do here is to tell ZFS to take a collection of physical storage stuff which can take several forms like a hard drive partition, a USB key partition or even a file and consider all of them as a single pool of storage (we will subdivide it in following paragraphs). No black magic here, ZFS will write some metadata on them behind the scene to be able to track which physical device belongs to what pool of storage.
  
 
<console>
 
<console>
Command: ##i##o ↵
+
###i## zpool create myfirstpool /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
This option deletes all partitions and creates a new protective MBR.
+
Proceed? (Y/N): ##i##y ↵
+
 
</console>
 
</console>
  
'''Create Partition 1''' (boot):
+
And.. nothing! Nada! The command silently returned but it ''did'' something, the next section will explain what.
  
 +
== Your first ZFS dataset ==
 
<console>
 
<console>
Command: ##i##n ↵
+
###i## zpool list
Partition Number: ##i##1 ↵
+
NAME          SIZE  ALLOC  FREE    CAP  DEDUP  HEALTH  ALTROOT
First sector: ##i##↵
+
myfirstpool  7.94G  130K  7.94G    0%  1.00x  ONLINE  -
Last sector: ##i##+500M ↵
+
Hex Code: ##i##↵
+
 
</console>
 
</console>
  
'''Create Partition 2''' (GRUB):
+
What does this mean? Several things: First, your zpool is here and has a size of, roughly, 8 Go minus some space eaten by some metadata. Second is is actually usable because the column ''HEALTH'' says ''ONLINE''. Other columns are not meaningful for us for the moment just ignore them. If want more crusty details you can use the zpool command like this:
  
 
<console>
 
<console>
Command: ##i##n ↵
+
###i## zpool status
Partition Number: ##i##2 ↵
+
  pool: myfirstpool
First sector: ##i##↵
+
state: ONLINE
Last sector: ##i##+32M ↵
+
  scan: none requested
Hex Code: ##i##EF02 ↵
+
config:
 +
 
 +
        NAME        STATE    READ WRITE CKSUM
 +
        myfirstpool  ONLINE      0    0    0
 +
          loop0    ONLINE      0    0    0
 +
          loop1    ONLINE      0    0    0
 +
          loop2    ONLINE      0    0    0
 +
          loop3    ONLINE      0    0    0
 
</console>
 
</console>
 +
Information is quite intuitive: your pool is seen as being usable (''state'' is similar to ''HEALTH'') and is composed of several devices each one listed as being in a ''healthy'' state ... at least for now because they will be salvaged for demonstration purpose in a later section. For your information the columns ''READ'',''WRITE'' and ''CKSUM'' list the number of operation failures on each of the devices respectfully:
 +
* ''READ'' for reading failures. Having a non-zero value is not a good sign... the device is clunky and will soon fail.
 +
* ''WRITE'' for writing failures. Having a non-zero value is not a good sign... the device is clunky and will soon fail.
 +
* ''CKSUM'' for mismatch between the checksum of the data at the time is had been written and how it has been recomputed when read again (yes, ZFS uses checksums in a agressive manner). Having a non-zero value is not a good sign... corruption happened, ZFS will do its best to recover data by its own but this is definitely not a good sign of a healthy system.
  
'''Create Partition 3''' (swap):
+
Cool! So far so good you have a new 8 Gb usable brand new storage space on you system. Has been mounted somewhere?
  
 
<console>
 
<console>
Command: ##i##n ↵
+
###i## mount | grep myfirstpool
Partition Number: ##i##3 ↵
+
/myfirstpool on /myfirstpool type zfs (rw,xattr)
First sector: ##i##↵
+
Last sector: ##i##+4G ↵
+
Hex Code: ##i##8200 ↵
+
 
</console>
 
</console>
  
'''Create Partition 4''' (root):
+
Remember the tables in the section above? A zpool in itself can '''never be mounted''', never ''ever''. It is just a container where ZFS datasets are created then mounted. So what happened here? Obscure black magic? No, of course not! Indeed a ZFS dataset named after the zpool's name should have been created automatically for us then mounted. Is is true? We will check this shortly. For the moment you will be introduced with the second command you will deal with when using ZFS : '''zfs'''. While the '''zpool''' command is used with anything related to zpools, the '''zfs''' is used to anything related to ZFS datasets '''(a ZFS dataset ''always'' resides in a zpool, ''always'' no exception on that).'''
 +
 
 +
{{Fancynote|'''zfs''' and '''zpool''' commands are the two only ones you will need to remember when dealing with ZFS.}}
 +
 
 +
So how can we check what ZFS datasets are currently known by the system? As you might already guessed like this:
  
 
<console>
 
<console>
Command: ##i##n ↵
+
###i## zfs list
Partition Number: ##i##4 ↵
+
NAME          USED  AVAIL  REFER  MOUNTPOINT
First sector: ##i##↵
+
myfirstpool  114K  7.81G    30K  /myfirstpool
Last sector: ##i##↵##!i## (for rest of disk)
+
Hex Code: ##i##↵
+
 
</console>
 
</console>
  
Along the way, you can type "<tt>p</tt>" and hit Enter to view your current partition table. If you make a mistake, you can type "<tt>d</tt>" to delete an existing partition that you created. When you are satisfied with your partition setup, type "<tt>w</tt>" to write your configuration to disk:
+
Tala! The mystery is busted! the ''zfs'' command tells us that not only a ZFS dataset named ''myfirstpool'' has been created but also it has been mounted in the system's VFS for us. If you check with the ''df'' command, you should also see something like this:
  
'''Write Partition Table To Disk''':
+
<console>
 +
###i## df -h
 +
Filesystem      Size  Used Avail Use% Mounted on
 +
(...)
 +
myfirstpool    7.9G    0  7.9G  0% /myfirstpool
 +
</console>
  
 +
The $100 question:''"what to do with this band new ZFS /myfirstpool dataset ?"''. Copy some files on it of course! We used a Linux kernel source but you can of course use whatever you want:
 
<console>
 
<console>
Command: ##i##w ↵
+
###i## cp -a /usr/src/linux-3.13.5-gentoo /myfirstpool
Do you want to proceed? (Y/N): ##i##Y ↵
+
###i## ln -s /myfirstpool/linux-3.13.5-gentoo /myfirstpool/linux
 +
###i## ls -lR /myfirstpool
 +
/myfirstpool:
 +
total 3
 +
lrwxrwxrwx  1 root root 32 Mar  2 14:02 linux -> /myfirstpool/linux-3.13.5-gentoo
 +
drwxr-xr-x 25 root root 50 Feb 27 20:35 linux-3.13.5-gentoo
 +
 
 +
/myfirstpool/linux-3.13.5-gentoo:
 +
total 31689
 +
-rw-r--r--  1 root root    18693 Jan 19 21:40 COPYING
 +
-rw-r--r--  1 root root    95579 Jan 19 21:40 CREDITS
 +
drwxr-xr-x 104 root root      250 Feb 26 07:39 Documentation
 +
-rw-r--r--  1 root root    2536 Jan 19 21:40 Kbuild
 +
-rw-r--r--  1 root root      277 Feb 26 07:39 Kconfig
 +
-rw-r--r--  1 root root  268770 Jan 19 21:40 MAINTAINERS
 +
(...)
 
</console>
 
</console>
  
The partition table will now be written to disk and <tt>gdisk</tt> will close.
+
A ZFS dataset  behaves like any other filesystem: you can create regular files, symbolic links, pipes, special devices nodes, etc. Nothing mystic here.
  
Now, your GPT/GUID partitions have been created, and will show up as the following ''block devices'' under Linux:
+
Now we have some data in the ZFS dataset let's see what various commands report:
 +
<console>
 +
###i## df -h
 +
Filesystem      Size  Used Avail Use% Mounted on
 +
(...)
 +
myfirstpool    7.9G  850M  7.0G  11% /myfirstpool
 +
</console>
 +
<console>
 +
###i## zfs list
 +
NAME          USED  AVAIL  REFER  MOUNTPOINT
 +
myfirstpool  850M  6.98G  850M  /myfirstpool
 +
</console>
 +
<console>
 +
###i## zpool list
 +
NAME          SIZE  ALLOC  FREE    CAP  DEDUP  HEALTH  ALTROOT
 +
myfirstpool  7.94G  850M  7.11G    10%  1.00x  ONLINE  -
 +
</console>
 +
{{Fancynote|Notice the various sizes reported by '''zpool''' and '''zfs''' commands. In this case it is  the same however it can differ, this is true especially with zpools mounted in RAID-Z.}}
  
* <tt>/dev/sda1</tt>, which will be used to hold the <tt>/boot</tt> filesystem,
+
== Unmounting/remounting a ZFS dataset ==
* <tt>/dev/sda2</tt>, which will be used directly by the new GRUB,
+
* <tt>/dev/sda3</tt>, which will be used for swap space, and
+
* <tt>/dev/sda4</tt>, which will hold your root filesystem.
+
  
===== For Previous fdisk users =====
 
  
If you have installed Gentoo Linux before, the one thing that is likely new to you here is the GRUB boot loader partition, which is listed as "BIOS boot partition" within <tt>gdisk</tt>. This partition is required for GRUB 2 to boot GPT/GUID boot disks. What is it? In GRUB-speak, this partition is essentially the location of the meat of GRUB's boot loading code. If you've used GRUB Legacy in the past, this partition is where the new GRUB stores the equivalent of the <tt>stage1_5</tt> and <tt>stage2</tt> files in legacy GRUB. Since GPT-based partition tables have less dead space at the beginning of the disk than their MBR equivalents, an explicitly defined partition of code <tt>EF02</tt> is required to hold the guts of the boot loader.
+
{{Fancyimportant|'''Only ZFS datasets can be mounted''' inside your host's VFS, no exception on that! Zpools cannot be mounted, never, never, never... please pay attention to the terminology and keep things clear by not messing up with terms. We will introduce ZFS snapshots and ZFS clones but those are ZFS datasets at the basis so they can also be mounted and unmounted.}}
  
In all other respects, the partition table is similar to that which you might create for an MBR-based disk during a Gentoo Linux installation. We have a boot and a root partition with code <tt>0700</tt>, and a Linux swap partition with code <tt>8200</tt>.
 
  
===== Partitioning Recommendations =====
+
If a ZFS dataset behaves just like any other filesystem, can we unmount it?
 +
<console>
 +
###i## umount /myfirstpool
 +
###i## mount | grep myfirstpool
 +
</console>
  
Below are our partitioning recommendations in table form. For GPT-based partitions, use the GPT Block Device and GPT Code columns with <tt>gdisk</tt>. For legacy MBR-based partitions, use the MBR Block Device and MBR code columns with <tt>fdisk</tt>:
+
No more ''/myfirstpool'' the line of sight! So yes, it is possible to unmount a ZFS dataset just like you would do with any other filesystem. Is the ZFS dataset still present on the system even it is unmounted? Let's check:
  
{| {{table}}
+
<console>
!Partition
+
###i## zfs list
!Size
+
NAME          USED  AVAIL  REFER  MOUNTPOINT
!MBR Block Device (<tt>fdisk</tt>)
+
myfirstpool  850M  6.98G  850M  /myfirstpool
!GPT Block Device (<tt>gdisk</tt>)
+
</console>
!Filesystem
+
!MBR Code
+
!GPT Code
+
|-
+
|<tt>/boot</tt>
+
|500 MB
+
|<tt>/dev/sda1</tt>
+
|<tt>/dev/sda1</tt>
+
|ext2
+
|83
+
|8300
+
|-
+
|GRUB boot loader partition
+
|32 MB
+
| ''not required for MBR''
+
|<tt>/dev/sda2</tt>
+
|For GPT/GUID only, skip for MBR - no filesystem.
+
|''N/A''
+
|EF02
+
|-
+
|swap
+
|2x RAM for low-memory systems and production servers; otherwise 2GB.
+
|<tt>/dev/sda2</tt>
+
|<tt>/dev/sda3</tt>
+
|swap (default)
+
|82
+
|8200
+
|-
+
|<tt>/</tt> (root)
+
|Rest of the disk, minimum of 10GB.
+
|<tt>/dev/sda3</tt>
+
|<tt>/dev/sda4</tt>
+
|XFS recommended, alternatively ext4
+
|83
+
|8300
+
|-
+
|<tt>/home</tt> (optional)
+
|User storage and media. Typically most of the disk.
+
|<tt>/dev/sda4</tt> (if created)
+
|<tt>/dev/sda5</tt> (if created)
+
|XFS recommended, alternatively ext4
+
|83
+
|8300
+
|-
+
| LVM (optional)
+
| If you want to create an LVM volume.
+
| <tt>/dev/sda4</tt> (PV, if created)
+
| <tt>/dev/sda5</tt> (PV, if created)
+
| LVM PV
+
| 8E
+
| 8E00
+
|}
+
  
==== Creating filesystems ====
+
Hopefully and obviously it is else ZFS would not be very useful. Your next concern would certainly be: "How can we remount it then?" Simple! Like this:
 +
<console>
 +
###i## zfs mount myfirstpool
 +
###i## mount | grep myfirstpool
 +
myfirstpool on /myfirstpool type zfs (rw,xattr)
 +
</console>
  
Before your newly-created partitions can be used, the block devices need to be initialized with filesystem ''metadata''. This process is known as ''creating a filesystem'' on the block devices. After filesystems are created on the block devices, they can be mounted and used to store files.
+
The ZFS dataset is back! :-)
  
You will not create a filesystem on your swap partition, but will initialize it using the <tt>mkswap</tt> command so that it can be used as disk-based virtual memory. Then we'll run the <tt>swapon</tt> command to make your newly-initialized swap space active within the live CD environment, in case it is needed during the rest of the install process.
+
== Your first contact with ZFS management by attributes or the end of /etc/fstab ==
 +
At this point you might be curious about how the '''zfs''' command know what it has to mount and ''where'' is has to mount it. You might be familiar with the following syntax of the '''mount''' command that, behind the scenes, scans the file ''/etc/fstab'' and mount the specified entry:
 +
<console>
 +
###i## mount /boot
 +
</console>
  
Note that we will not create a filesystem on the GRUB boot loader partition, as GRUB writes binary data directly to that partition when the boot loader is installed, which we'll do later.
+
Does ''/etc/fstab'' contain something related to our ZFS dataset?
  
You can see the commands you will need to type below. Like the rest of this document, it assumes that you are using a GPT partitioning scheme. If you are using MBR, your root filesystem will likely be created on <tt>/dev/sda3</tt> instead and you will need to adjust the target block devices. If you are following our recommendations, then simply do this:
+
<console>
 +
###i## cat /etc/fstab | grep myfirstpool
 +
#
 +
</console>
 +
 
 +
Doh!!!... Obvisouly nothing there. Another mystery? Sure not! The answer lies in a extremely powerful feature of ZFS: the attributes. Simply speaking: an attribute is named property of a ZFS dataset that holds a value. Attributes govern various aspects of how the datasets are managed like: ''"Is the data has to be compressed?"'', ''"Is the data has to be encrypted?"'', ''"Is the data has to be exposed to the rest of the world by NFS or SMB/Samba?"'' and of course... '''"Where the dataset has to be mounted?"''. The answer to that latter question can be tell by the following command:
  
 
<console>
 
<console>
# ##i##mke2fs -t ext2 /dev/sda1
+
###i## zfs get mountpoint myfirstpool
# ##i##mkfs.xfs /dev/sda4
+
NAME        PROPERTY    VALUE        SOURCE
# ##i##mkswap /dev/sda3
+
myfirstpool  mountpoint  /myfirstpool  default
# ##i##swapon /dev/sda3
+
 
</console>
 
</console>
  
==== Mounting filesystems ====
+
Bingo! When you remounted the dataset just some paragraphs ago, ZFS automatically inspected the ''mountpoint'' attribute and saw this dataset has to be mounted in the directory ''/myfirstpool''.
 +
 
 +
= A step forward with ZFS datasets =
 +
 
 +
So far you were given a quick tour of what ZFS can do for you and  it is very important at this point to distinguish a ''zpool'' from a ''ZFS dataset'' and to call a dataset for what it is (a dataset) and not for what is is not (a zpool). It is a bit confusing and an editorial choice to have choosen a confusing name just to make you familiar with the one and the other.
 +
 
 +
== Creating datasets ==
 +
 
 +
Obviously it is possible to have more than one ZFS dataset within a single zpool. Quizz: what command would you use to subdivide a zpool in datasets? '''zfs''' or '''zpool'''? Stops reading for two seconds and try to figure out this little question. Frankly.
  
Mount the newly-created filesystems as follows, creating <tt>/mnt/funtoo</tt> as the installation mount point:
+
Answer is... '''zfs'''! Although you want to operate on the zpool to logically subdivide it in several datasets, you manage datasets at the end thus you will use the '''zfs''' command. It is not always easy at the beginning, do not be too worry you will soon get the habit when to use one or the other. Creating a dataset in a zpool is easy: just give to the '''zfs''' command the name of the pool you want to divide and the name of the dataset you want to create in it. So let's create three datasets named ''myfirstDS'', ''mysecondDS'' and ''mythirdDS'' in ''myfirstpool''(observe how we use the zpool and datasets' names) :
  
 
<console>
 
<console>
# ##i##mkdir /mnt/funtoo
+
###i## zfs create myfirstpool/myfirstDS
# ##i##mount /dev/sda4 /mnt/funtoo
+
###i## zfs create myfirstpool/mysecondDS
# ##i##mkdir /mnt/funtoo/boot
+
###i## zfs create myfirstpool/mythirdDS
# ##i##mount /dev/sda1 /mnt/funtoo/boot
+
 
</console>
 
</console>
  
Optionally, if you have a separate filesystem for <tt>/home</tt> or anything else:
+
What happened? Let's check :
  
 
<console>
 
<console>
# ##i##mkdir /mnt/funtoo/home
+
###i## zfs list
# ##i##mount /dev/sda5 /mnt/funtoo/home
+
NAME                    USED  AVAIL  REFER  MOUNTPOINT
 +
myfirstpool              850M  6.98G  850M  /myfirstpool
 +
myfirstpool/myfirstDS    30K  6.98G    30K  /myfirstpool/myfirstDS
 +
myfirstpool/mysecondDS    30K  6.98G    30K  /myfirstpool/mysecondDS
 +
myfirstpool/mythirdDS    30K  6.98G    30K  /myfirstpool/mythirdDS
 
</console>
 
</console>
  
If you have <tt>/tmp</tt> or <tt>/var/tmp</tt> on a separate filesystem, be sure to change the permissions of the mount point to be globally-writeable after mounting, as follows:
+
Obviously we have there what we asked. Moreover if we inspect the contents of ''/myfirstpool'' we can notice three new directories having the same than just created:
  
 
<console>
 
<console>
# ##i##chmod 1777 /mnt/funtoo/tmp
+
###i## ls -l /myfirstpool
 +
total 8
 +
lrwxrwxrwx  1 root root 32 Mar  2 14:02 linux -> /myfirstpool/linux-3.13.5-gentoo
 +
drwxr-xr-x 25 root root 50 Feb 27 20:35 linux-3.13.5-gentoo
 +
drwxr-xr-x  2 root root  2 Mar  2 15:26 myfirstDS
 +
drwxr-xr-x  2 root root  2 Mar  2 15:26 mysecondDS
 +
drwxr-xr-x  2 root root  2 Mar  2 15:26 mythirdDS
 
</console>
 
</console>
 +
No surprise here! As you might have guessed, those three new directories serves as mountpoints:
  
=== Installing the Stage 3 tarball ===
+
<console>
==== Stage 3 tarball ====
+
###i## mount | grep myfirstpool
 +
myfirstpool on /myfirstpool type zfs (rw,xattr)
 +
myfirstpool/myfirstDS on /myfirstpool/myfirstDS type zfs (rw,xattr)
 +
myfirstpool/mysecondDS on /myfirstpool/mysecondDS type zfs (rw,xattr)
 +
myfirstpool/mythirdDS on /myfirstpool/mythirdDS type zfs (rw,xattr)
 +
</console>
  
After creating filesystems, the next step is downloading the initial Stage 3 tarball. The Stage 3 is a pre-compiled system used as a starting point to install Funtoo Linux. Visit the [[Download]] page and copy the URL to the Stage 3 tarball you want to use. We will download it soon.
+
As we did before, we can copy some files in the newly created datasets just like they were regular directories:
  
{{fancyimportant|If your system's date and time are too far off (typically by months or years,) then it may prevent Portage from properly downloading source tarballs. This is because some of our sources are downloaded via HTTPS, which use SSL certificates and are marked with an activation and expiration date.}}
+
<console>
 +
###i## cp -a /usr/portage /myfirstpool/mythirdDS
 +
###i## ls -l /myfirstpool/mythirdDS/*
 +
total 697
 +
drwxr-xr-x  48 root root  49 Aug 18  2013 app-accessibility
 +
drwxr-xr-x  238 root root  239 Jan 10 06:22 app-admin
 +
drwxr-xr-x    4 root root    5 Dec 28 08:54 app-antivirus
 +
drwxr-xr-x  100 root root  101 Feb 26 07:19 app-arch
 +
drwxr-xr-x  42 root root  43 Nov 26 21:24 app-backup
 +
drwxr-xr-x  34 root root  35 Aug 18  2013 app-benchmarks
 +
drwxr-xr-x  66 root root  67 Oct 16 06:39 app-cdr(...)
 +
</console>
  
Now is a good time to verify the date and time are correctly set to UTC. Use the <tt>date</tt> command to verify the date and time:
+
Nothing really too exciting here, we have file in ''mythirdDS''. A bit more interesting output:
  
 
<console>
 
<console>
# ##i##date
+
###i## zfs list
Fri Jul 15 19:47:18 UTC 2011
+
NAME                    USED  AVAIL  REFER  MOUNTPOINT
 +
myfirstpool            1.81G  6.00G  850M  /myfirstpool
 +
myfirstpool/myfirstDS    30K  6.00G    30K  /myfirstpool/myfirstDS
 +
myfirstpool/mysecondDS    30K  6.00G    30K  /myfirstpool/mysecondDS
 +
myfirstpool/mythirdDS  1002M  6.00G  1002M  /myfirstpool/mythirdDS
 
</console>
 
</console>
 +
<console>
 +
###i## df -h
 +
Filesystem              Size  Used Avail Use% Mounted on
 +
(...)
 +
myfirstpool            6.9G  850M  6.1G  13% /myfirstpool
 +
myfirstpool/myfirstDS  6.1G    0  6.1G  0% /myfirstpool/myfirstDS
 +
myfirstpool/mysecondDS  6.1G    0  6.1G  0% /myfirstpool/mysecondDS
 +
myfirstpool/mythirdDS  7.0G 1002M  6.1G  15% /myfirstpool/mythirdDS
 +
</console>
 +
 +
Noticed the size given for the 'AVAIL' column? At the very beginning of this tutorial we had slightly less than  8 Gb of available space, it now has a value of roughly 6 Gb. The datasets are just a subdivision of the zpool, they '''compete with each others''' for using the available storage within the zpool, no miracle here. To what limit? The pool itself as we never imposed a ''quota'' on datasets. Hopefully '''df''' and '''zfs list''' gives a coherent result.
 +
 +
== Second contact with attributes: quota management ==
  
If the date and/or time need to be corrected, do so using <tt>date MMDDhhmmYYYY</tt>, keeping in mind <tt>hhmm</tt> are in 24-hour format. The example below changes the date and time to "July 16th, 2011 @ 8:00PM" UTC:
+
Remember how painful is the quota management under Linux? Now you can say goodbye to '''setquota''', '''edquota''' and other '''quotacheck''' commands, ZFS handle this in the snap of fingers! Guess with what? An ZFS dataset attribute of course! ;-) Just to make you drool here is how a 2Gb limit can be set on ''myfirstpool/mythirdDS'' :
  
 
<console>
 
<console>
# ##i##date 071620002011
+
###i## zfs set quota=2G myfirstpool/mythirdDS
Fri Jul 16 20:00:00 UTC 2011
+
 
</console>
 
</console>
  
Once you are in your Funtoo Linux root filesystem, use <tt>wget</tt> to download the Stage 3 tarball you have chosen from the [[Download]] page to use as the basis for your new Funtoo Linux system. It should be saved to the <tt>/mnt/funtoo</tt> directory as follows:
+
''Et voila!'' The '''zfs''' command is bit silent however if we check we can see that ''myfirstpool/mythirdDS'' is now capped to 2 Gb (forget about 'REFER' for the moment): around 1 Gb of data has been copied in this dataset thus leaving a big 1 Gb of available space.
  
<console># ##i##cd /mnt/funtoo
+
<console>
# ##i##wget http://ftp.osuosl.org/pub/funtoo/funtoo-current/x86-64bit/generic_64/stage3-latest.tar.xz
+
###i## zfs list
 +
NAME                    USED  AVAIL  REFER  MOUNTPOINT
 +
myfirstpool            1.81G  6.00G  850M  /myfirstpool
 +
myfirstpool/myfirstDS    30K  6.00G    30K  /myfirstpool/myfirstDS
 +
myfirstpool/mysecondDS    30K  6.00G    30K  /myfirstpool/mysecondDS
 +
myfirstpool/mythirdDS  1002M  1.02G  1002M  /myfirstpool/mythirdDS
 
</console>
 
</console>
  
 +
Using the '''df''' command:
  
Note that 64-bit systems can run 32-bit or 64-bit stages, but 32-bit systems can only run 32-bit stages. Make sure that you select a Stage 3 build that is appropriate for your CPU. If you are not certain, it is a safe bet to choose the <tt>generic_64</tt> or <tt>generic_32</tt> stage. Consult the [[Download]] page for more information.
+
<console>
 +
###i## df -h                               
 +
Filesystem              Size  Used Avail Use% Mounted on
 +
(...)
 +
myfirstpool            6.9G  850M  6.1G  13% /myfirstpool
 +
myfirstpool/myfirstDS  6.1G    0  6.1G  0% /myfirstpool/myfirstDS
 +
myfirstpool/mysecondDS  6.1G    0  6.1G  0% /myfirstpool/mysecondDS
 +
myfirstpool/mythirdDS  2.0G 1002M  1.1G  49% /myfirstpool/mythirdDS
 +
</console>
 +
 
 +
Of course you can use this technique for the home directories of your users /home this also having the a advantage of being much less forgiving than a soft/hard user quota: when the limit is reached, it is reached period and no more data can be written in the dataset. The user must do some cleanup and cannot procastinate anymore :-)
 +
 
 +
To remove the quota:
  
Once the stage is downloaded, extract the contents with the following command, substituting in the actual name of your stage 3 tarball:
 
 
<console>
 
<console>
# ##i##tar xpf stage3-latest.tar.xz
+
###i## zfs set quota=none myfirstpool/mythirdDS
 
</console>
 
</console>
  
{{fancyimportant|It is very important to use <tt>tar</tt>'s "<tt>p</tt>" option when extracting the Stage 3 tarball - it tells <tt>tar</tt> to ''preserve'' any permissions and ownership that exist within the archive. Without this option, your Funtoo Linux filesystem permissions will be incorrect.}}
+
''none'' is simply the original value for the ''quota'' attribute (we did not demonstrate it, you can check by doing a '''zfs get quota  myfirstpool/mysecondDS''' for example).
 +
 
 +
== Destroying datasets ==
 +
{{Fancyimportant|There is no way to resurrect a destroyed ZFS dataset and the data it contained! Once you destroy a dataset the corresponding metadata is cleared and gone forever so be careful when using ''zfs destroy'' notably with the ''-r'' option ... }}
 +
 
 +
 
 +
We have three datasets, but the third is pretty useless and contains a lot of garbage. Is it possible to remove it with a simple '''rm -rf'''? Let's try:
  
=== Chroot into Funtoo ===
 
Before chrooting into your new system, there's a few things that need to be done first. You will need to mount /proc and /dev inside your new system. Use the following commands:
 
 
<console>
 
<console>
# ##i##cd /mnt/funtoo
+
###i## rm -rf /myfirstpool/mythirdDS
# ##i##mount -t proc none proc
+
rm: cannot remove `/myfirstpool/mythirdDS': Device or resource busy
# ##i##mount --rbind /sys sys
+
# ##i##mount --rbind /dev dev
+
 
</console>
 
</console>
  
You'll also want to copy over <tt>resolv.conf</tt> in order to have proper DNS name resolution from inside the chroot:
+
This is perfectly normal, remember that datasets are indeed something '''mounted''' in your VFS. ZFS might be ZFS and do alot for you, it cannot enforce the nature of a mounted filesystem under Linux/Unix. The "ZFS way" to remove a dataset is to use the ''zfs'' command like this at the reserve no process owns open files on it (once again, ZFS can do miracles for you but not that kind of miracles as it has to unmount the dataset before deleting it):
 +
 
 
<console>
 
<console>
# ##i##cp /etc/resolv.conf etc
+
###i## zfs destroy myfirstpool/mythirdDS
 +
###i## zfs list
 +
NAME                    USED  AVAIL  REFER  MOUNTPOINT
 +
myfirstpool              444M  7.38G  444M  /myfirstpool
 +
myfirstpool/myfirstDS    21K  7.38G    21K  /myfirstpool/myfirstDS
 +
myfirstpool/mysecondDS    21K  7.38G    21K  /myfirstpool/mysecondDS
 
</console>
 
</console>
  
Now you can chroot into your new system. Use <tt>env</tt> before <tt>chroot</tt> to ensure that no environment variables from the installation media are used by your new system:
+
''Et voila''! No more ''myfirstpool/mythirdDS'' dataset. :-)
 +
 
 +
A bit more subtle case would be to try to destroy a ZFS dataset whenever another ZFS dataset is nested in it. Before doing that nasty experiment  ''myfirstpool/mythirdDS'' must be created again this time with another nested dataset (''myfirstpool/mythirdDS/nestedSD1''):
  
 
<console>
 
<console>
# ##i##env -i HOME=/root TERM=$TERM chroot . bash -l
+
###i## zfs create myfirstpool/mythirdDS
 +
###i## zfs create myfirstpool/mythirdDS/nestedSD1
 +
###i## zfs list
 +
NAME                              USED  AVAIL  REFER  MOUNTPOINT
 +
myfirstpool                      851M  6.98G  850M  /myfirstpool
 +
myfirstpool/myfirstDS              30K  6.98G    30K  /myfirstpool/myfirstDS
 +
myfirstpool/mysecondDS            30K  6.98G    30K  /myfirstpool/mysecondDS
 +
myfirstpool/mythirdDS            124K  6.98G    34K  /myfirstpool/mythirdDS
 +
myfirstpool/mythirdDS/nestedDS1    30K  6.98G    30K  /myfirstpool/mythirdDS/nestedDS1
 
</console>
 
</console>
  
{{fancynote|Users of live CDs with 64-bit kernels: Some software may use <tt>uname -r</tt> to check whether the system is 32 or 64-bit. You may want append linux32 to the chroot command as a workaround, but it's generally not needed.}}
+
Now let's try to destroy ''myfirstpool/mythirdDS'' again:
{{fancyimportant|If you receive the error "<tt>chroot: failed to run command `/bin/bash': Exec format error</tt>", it is probably because you are running a 32-bit kernel and trying to execute 64-bit code. SystemRescueCd boots with a 32-bit kernel by default.}}
+
  
It's also a good idea to change the default command prompt while inside the chroot. This will avoid confusion if you have to change terminals. Use this command:
 
 
<console>
 
<console>
# ##i##export PS1="(chroot) $PS1"
+
###i## zfs destroy myfirstpool/mythirdDS
 +
cannot destroy 'myfirstpool/mythirdDS': filesystem has children
 +
use '-r' to destroy the following datasets:
 +
myfirstpool/mythirdDS/nestedDS1
 
</console>
 
</console>
  
Congratulations! You are now chrooted inside a Funtoo Linux system. Now it's time to get Funtoo Linux properly configured so that Funtoo Linux will boot successfully when your system is restarted.
+
The zfs command detected the situation  and refused to proceed on the deletion without your consent to make a recursive destruction (-r parameter). Before going any step further let's create some more nested datasets plus a couple of directories inside ''myfirstpool/mythirdDS'':
  
=== Downloading the Portage tree ===
+
<console>
 +
###i## zfs create myfirstpool/mythirdDS/nestedDS1
 +
###i## zfs create myfirstpool/mythirdDS/nestedDS2
 +
###i## zfs create myfirstpool/mythirdDS/nestedDS3
 +
###i## zfs create myfirstpool/mythirdDS/nestedDS3/nestednestedDS
 +
###i## mkdir /myfirstpool/mythirdDS/dir1
 +
###i## mkdir /myfirstpool/mythirdDS/dir2
 +
###i## mkdir /myfirstpool/mythirdDS/dir3
 +
</console>
 +
<console>
 +
###i## zfs list
 +
NAME                                            USED  AVAIL  REFER  MOUNTPOINT
 +
myfirstpool                                      851M  6.98G  850M  /myfirstpool
 +
myfirstpool/myfirstDS                            30K  6.98G    30K  /myfirstpool/myfirstDS
 +
myfirstpool/mysecondDS                            30K  6.98G    30K  /myfirstpool/mysecondDS
 +
myfirstpool/mythirdDS                            157K  6.98G    37K  /myfirstpool/mythirdDS
 +
myfirstpool/mythirdDS/nestedDS1                  30K  6.98G    30K  /myfirstpool/mythirdDS/nestedDS1
 +
myfirstpool/mythirdDS/nestedDS2                  30K  6.98G    30K  /myfirstpool/mythirdDS/nestedDS2
 +
myfirstpool/mythirdDS/nestedDS3                  60K  6.98G    30K  /myfirstpool/mythirdDS/nestedDS3
 +
myfirstpool/mythirdDS/nestedDS3/nestednestedDS    30K  6.98G    30K  /myfirstpool/mythirdDS/nestedDS3/nestednestedDS
 +
</console>
  
{{fancynote|For an alternative way to do this, see [[Installing Portage From Snapshot]].}}
+
Now what happens if ''myfirstpool/mythirdDS'' is destroyed again with '-r'?
Now it's time to install a copy of the Portage repository, which contains package scripts (ebuilds) that tell portage how to build and install thousands of different software packages. To create the Portage repository, simply run <tt>emerge --sync</tt> from within the chroot. This will automatically clone the portage tree from [http://github.com/ GitHub]:
+
  
 
<console>
 
<console>
(chroot) # ##i##emerge --sync
+
###i## zfs destroy -r myfirstpool/mythirdDS
 +
###i## zfs list                           
 +
NAME                    USED  AVAIL  REFER  MOUNTPOINT
 +
myfirstpool              851M  6.98G  850M  /myfirstpool
 +
myfirstpool/myfirstDS    30K  6.98G    30K  /myfirstpool/myfirstDS
 +
myfirstpool/mysecondDS    30K  6.98G    30K  /myfirstpool/mysecondDS
 
</console>
 
</console>
  
{{fancyimportant|If you receive the error with initial <tt>emerge --sync</tt> due to git protocol restrictions, change <tt>SYNC</tt> variable in <tt>/etc/portage/make.conf</tt>}}
+
''myfirstpool/mythirdDS'' and everything it contained is now gone!
<pre>
+
SYNC="https://github.com/funtoo/ports-2012.git"
+
</pre>
+
  
 +
== Snapshotting and rolling back datasets ==
  
=== Configuring your system ===
+
This is, by far, one of the coolest features of ZFS. You can:
As is expected from a Linux distribution, Funtoo Linux has its share of configuration files. The one file you are absolutely required to edit in order to ensure that Funtoo Linux boots successfully is <tt>/etc/fstab</tt>. The others are optional. Here are a list of files that you should consider editing:
+
# take a photo of a dataset (this photo is called a ''snapshot'')
 +
# do ''whatever'' you want with the data contained in the dataset
 +
# restore (roll back) the dataset in  in the '''exact''' same state it was before you did your changes just as if nothing had ever happened in the middle.
  
{| {{table}}
+
=== Single snapshot ===
!File
+
!Do I need to change it?
+
!Description
+
|-
+
|<tt>/etc/fstab</tt>
+
|'''YES - required'''
+
|Mount points for all filesystems to be used at boot time. This file must reflect your disk partition setup. We'll guide you through modifying this file below.
+
|-
+
|<tt>/etc/localtime</tt>
+
|''Maybe - recommended''
+
|Your timezone, which will default to UTC if not set. This should be a symbolic link to something located under /usr/share/zoneinfo (e.g. /usr/share/zoneinfo/America/Montreal)
+
|-
+
|<tt>/etc/make.conf<br/>/etc/portage/make.conf&nbsp;(new&nbsp;location)</tt>
+
|''Maybe - recommended''
+
|Parameters used by gcc (compiler), portage, and make. It's a good idea to set MAKEOPTS. This is covered later in this document.
+
|-
+
|<tt>/etc/conf.d/hostname</tt>
+
|''Maybe - recommended''
+
|Used to set system hostname. Set to the fully-qualified (with dots) name. Defaults to <tt>localhost</tt> if not set.
+
|-
+
|<tt>/etc/hosts</tt>
+
|''No''
+
| You no longer need to manually set the hostname in this file. This file is automatically generated by <tt>/etc/init.d/hostname</tt>.
+
|-
+
|<tt>/etc/conf.d/keymaps</tt>
+
|Optional
+
|Keyboard mapping configuration file (for console pseudo-terminals). Set if you have a non-US keyboard. See [[Funtoo Linux Localization]].
+
|-
+
|<tt>/etc/conf.d/hwclock</tt>
+
|Optional
+
|How the time of the battery-backed hardware clock of the system is interpreted (UTC or local time). Linux uses the battery-backed hardware clock to initialize the system clock when the system is booted.
+
|-
+
|<tt>/etc/conf.d/modules</tt>
+
|Optional
+
|Kernel modules to load automatically at system startup. Typically not required. See [[Additional Kernel Resources]] for more info.
+
|-
+
|<tt>profiles</tt>
+
|Optional
+
|Some useful portage settings that may help speed up intial configuration.
+
|}
+
  
If you're installing an English version of Funtoo Linux, you're in luck as most of the configuration files can be used as-is. If you're installing for another locale, don't worry. We will walk you through the necessary configuration steps on the [[Funtoo Linux Localization]] page, and if needed, there's always plenty of friendly, helpful support. (See [[#Community portal|Community]])
+
{{Fancyimportant|'''Only ZFS datasets''' can be snapshotted and rolled back, not the zpool.}}
  
Let's go ahead and see what we have to do. Use <tt>nano -w <name_of_file></tt> to edit files -- the "<tt>-w</tt>" disables word-wrapping, which is handy when editing configuration files. You can copy and paste from the examples.
 
  
{{fancywarning|It's important to edit your <tt>/etc/fstab</tt> file before you reboot! You will need to modify both the "fs" and "type" columns to match the settings for your partitions and filesystems that you created with <tt>gdisk</tt> or <tt>fdisk</tt>. Skipping this step may prevent Funtoo Linux from booting successfully.}}
+
To start with, let's copy some files in ''mysecondDS'':
  
==== /etc/fstab ====
+
<console>
 +
###i## cp -a /usr/portage /myfirstpool/mysecondDS
 +
###i## ls /myfirstpool/mysecondDS/portage
 +
total 672
 +
drwxr-xr-x  48 root root  49 Aug 18  2013 app-accessibility
 +
drwxr-xr-x  238 root root  239 Jan 10 06:22 app-admin
 +
drwxr-xr-x    4 root root    5 Dec 28 08:54 app-antivirus
 +
drwxr-xr-x  100 root root  101 Feb 26 07:19 app-arch
 +
drwxr-xr-x  42 root root  43 Nov 26 21:24 app-backup
 +
drwxr-xr-x  34 root root  35 Aug 18  2013 app-benchmarks
 +
(...)
 +
drwxr-xr-x  62 root root  63 Feb 20 06:47 x11-wm
 +
drwxr-xr-x  16 root root  17 Aug 18  2013 xfce-base
 +
drwxr-xr-x  64 root root  65 Dec 14 19:09 xfce-extra
 +
</console>
  
<tt>/etc/fstab</tt> is used by the <tt>mount</tt> command which is ran when your system boots. Statements of this file inform <tt>mount</tt> about partitions to be mounted and how they are mounted. In order for the system to boot properly, you must edit <tt>/etc/fstab</tt> and ensure that it reflects the partition configuration you used earlier:
+
Now, let's take a snapshot of ''mysecondDS''. What command would be used? '''zpool''' or '''zfs'''? In that case it is '''zfs''' because we manipulate a ZFS dataset (this time you problably got it right!):
  
 
<console>
 
<console>
(chroot) # ##i##nano -w /etc/fstab
+
###i## zfs snapshot myfirstpool/mysecondDS@Charlie
 
</console>
 
</console>
  
You can use arrow keys to move around and hit Control-X to exit. If you want to save your changes, type "<tt>Y</tt>" when asked if you want to save the modified buffer, or hit Control-O before closing <tt>nano</tt>. Otherwise your changes will be discarded.
+
{{fancynote|The syntax is always ''pool/dataset@snapshot'', the snapshot's name is left at your discretion however '''you must use an arobase  sign (@)''' to separate the snapshot's name from the rest of the path.}}
  
<pre>
+
Let's check what ''/myfirstpool/mysecondDS'' contains after taking the snapshot:
# The root filesystem should have a pass number of either 0 or 1.
+
<console>
# All other filesystems should have a pass number of 0 or greater than 1.
+
###i## ls -la /myfirstpool/mysecondDS   
#
+
total 9
# NOTE: If your BOOT partition is ReiserFS, add the notail option to opts.
+
drwxr-xr-x  3 root root  3 Mar  2 18:22 .
#
+
drwxr-xr-x  5 root root  6 Mar  2 17:58 ..
# See the manpage fstab(5) for more information.
+
drwx------ 170 root root 171 Mar  2 18:36 portage
#
+
</console>
# <fs>     <mountpoint>  <type>  <opts>        <dump/pass>
+
  
/dev/sda1    /boot        ext2    noatime        1 2
+
Nothing really new the ''portage'' directory is here nothing more ''a priori''. If you have used BTRFS before reading this tutorial you probably expected to see a ''@Charlie'' lying in ''/myfirstpool/mysecondDS''? So where the check is ''Charlie''? In ZFS a dataset snapshot is not visible from within the VFS tree (if you are not convinced you can search for it with the '''find''' command but it will never find it). Let's check with the '''zfs''' command:
/dev/sda3    none          swap    sw            0 0
+
/dev/sda4    /            ext4    noatime        0 1
+
#/dev/cdrom  /mnt/cdrom    auto    noauto,ro      0 0
+
</pre>
+
  
==== /etc/localtime ====
+
<console>
 +
###i## zfs list
 +
###i## zfs list -t all   
 +
NAME                            USED  AVAIL  REFER  MOUNTPOINT
 +
myfirstpool                    1.81G  6.00G  850M  /myfirstpool
 +
myfirstpool/myfirstDS            30K  6.00G    30K  /myfirstpool/myfirstDS
 +
myfirstpool/mysecondDS          1001M  6.00G  1001M  /myfirstpool/mysecondDS
 +
</console>
  
<tt>/etc/localtime</tt> is used to specify the timezone that your machine is in, and defaults to UTC. If you would like your Funtoo Linux system to use local time, you should replace <tt>/etc/localtime</tt> with a symbolic link to the timezone that you wish to use.
+
Wow... No sign of the snapshot. What you mus know is that indeed '''zfs list''' shows only datasets by default and omits snapshots. If the command is invoked with the parameter ''-t'' set to ''all'' it will list everything:
  
 
<console>
 
<console>
(chroot) # ##i##ln -sf /usr/share/zoneinfo/America/Montreal /etc/localtime
+
###i## zfs list
 +
###i## zfs list -t all   
 +
NAME                            USED  AVAIL  REFER  MOUNTPOINT
 +
myfirstpool                    1.81G  6.00G  850M  /myfirstpool
 +
myfirstpool/myfirstDS            30K  6.00G    30K  /myfirstpool/myfirstDS
 +
myfirstpool/mysecondDS          1001M  6.00G  1001M  /myfirstpool/mysecondDS
 +
myfirstpool/mysecondDS@Charlie      0      -  1001M  -
 
</console>
 
</console>
  
The above sets the timezone to Eastern Time Canada. Go to <tt>/usr/share/zoneinfo</tt> to see which values to use.
+
So yes, ''@Charlie'' is here! Also notice here the power of copy-on-write filesystems: ''@Charlie'' takes only a couple of kilobytes (some ZFS metadata) just like any ZFS snapshot at the time they are taken. The reason snapshots occupy very little space in the datasets is because data and metadata blocks are the same and no physical copy of them are made. At the time goes on and more and more changes happens in the original dataset (''myfirstpool/mysecondDS'' here), ZFS will allocate new data and metadata blocks to accommodate the changes but will leave the blocks used by the snapshot untouched and the snapshot will tend to eat more and more pool space. It seems odd at first glance because a snapshot is a frozen in time copy of a ZFS dataset but this the way ZFS manage them. So caveat emptor: remove any unused snapshot to not full your zpool...
  
==== /etc/make.conf ====
+
Now we have found Charlie, let's do some changes in the ''mysecondDS'':
  
{{fancynote|This file is the symlink to /etc/portage/make.conf, new default location, edit /etc/portage/make.conf.}}
+
<console>
 +
###i## rm -rf /myfirstpool/mysecondDS/portage/[a-h]*
 +
###i## echo "Hello, world" >  /myfirstpool/mysecondDS/hello.txt
 +
###i## cp /lib/firmware/radeon/* /myfirstpool/mysecondDS
 +
###i## ls -l  /myfirstpool/mysecondDS
 +
/myfirstpool/mysecondDS:
 +
total 3043
 +
-rw-r--r--  1 root root  8704 Mar  2 19:29 ARUBA_me.bin
 +
-rw-r--r--  1 root root  8704 Mar  2 19:29 ARUBA_pfp.bin
 +
-rw-r--r--  1 root root  6144 Mar  2 19:29 ARUBA_rlc.bin
 +
-rw-r--r--  1 root root  24096 Mar  2 19:29 BARTS_mc.bin
 +
-rw-r--r--  1 root root  5504 Mar  2 19:29 BARTS_me.bin
 +
(...)
 +
-rw-r--r--  1 root root  60388 Mar  2 19:29 VERDE_smc.bin
 +
-rw-r--r--  1 root root    13 Mar  2 19:28 hello.txt
 +
drwx------ 94 root root    95 Mar  2 19:28 portage
  
MAKEOPTS can be used to define how many parallel compilations should occur when you compile a package, which can speed up compilation significantly. A rule of thumb is the number of CPUs (or CPU threads) in your system plus one. If for example you have a dual core processor without [[wikipedia:Hyper-threading|hyper-threading]], then you would set MAKEOPTS to 3:
+
/myfirstpool/mysecondDS/portage:
 +
total 324
 +
drwxr-xr-x  16 root root  17 Oct 26 07:30 java-virtuals
 +
drwxr-xr-x 303 root root  304 Jan 21 06:53 kde-base
 +
drwxr-xr-x 117 root root  118 Feb 21 06:24 kde-misc
 +
drwxr-xr-x  2 root root  756 Feb 23 08:44 licenses
 +
drwxr-xr-x  20 root root  21 Jan  7 06:56 lxde-base
 +
(...)
 +
</console>
  
<pre>
+
Now let's check again what the '''zpool''' command gives:
MAKEOPTS="-j3"
+
</pre>
+
  
If you are unsure about how many processors/threads you have then use /proc/cpuinfo to help you.
 
 
<console>
 
<console>
(chroot) # ##i##grep "processor" /proc/cpuinfo | wc -l
+
###i## zfs list -t all                     
16
+
NAME                            USED  AVAIL  REFER  MOUNTPOINT
 +
myfirstpool                    1.82G  6.00G  850M  /myfirstpool
 +
myfirstpool/myfirstDS            30K  6.00G    30K  /myfirstpool/myfirstDS
 +
myfirstpool/mysecondDS          1005M  6.00G  903M  /myfirstpool/mysecondDS
 +
myfirstpool/mysecondDS@Charlie  102M      -  1001M  -
 
</console>
 
</console>
  
Set MAKEOPTS to this number plus one:
+
Noticed the size's increase of ''myfirstpool/mysecondDS@Charlie''? This is mainly due to new files copied in the snasphot: ZFS had to retained the original blocks of data. Now time to roll  this ZFS dataset back to its original state (if some processes would have open files in the dataset to be rolled back, you should terminate them first) :
  
<pre>
+
<console>
MAKEOPTS="-j17"
+
###i## zfs rollback myfirstpool/mysecondDS@Charlie
</pre>
+
###i## ls -l /myfirstpool/mysecondDS
 +
total 6
 +
drwxr-xr-x 164 root root 169 Aug 18 18:25 portage
 +
</console>
  
USE flags define what functionality is enabled when packages are built. It is not recommended to add a lot of them during installation; you should wait until you have a working, bootable system before changing your USE flags. A USE flag prefixed with a minus ("<tt>-</tt>") sign tells Portage not to use the flag when compiling.  A Funtoo guide to USE flags will be available in the future. For now, you can find out more information about USE flags in the [http://www.gentoo.org/doc/en/handbook/handbook-amd64.xml?part=2&chap=2 Gentoo Handbook].
+
Again, ZFS handled everything for you and you now have the contents of ''mysecondDS'' exactly as it was at the time the snapshot ''Charlie'' was taken. Not more complicated than that. Not illustrated here but if you look at the output given by '''zfs list -t all''' at this point you will notice that the ''Charlie'' snapshot only eat very little space. This is normal: the modified blocks have been dropped so ''myfirstpool/mysecondDS'' and its ''myfirstpool/mysecondDS@Charlie'' snapshot are the same module some metadata (hence the few kilobytes of space taken).
  
LINGUAS tells Portage which local language to compile the system and applications in (those who use LINGUAS variable like OpenOffice). It is not usually necessary to set this if you use English. If you want another language such as French (fr) or German (de), set LINGUAS appropriately:
+
=== the .zfs pseudo-directory or the secret passage to your snapshots ===
  
<pre>
+
Any directory where  a ZFS dataset is mounted (having snapshots or not) secretly contains a pseudo-directory named '''.zfs''' (dot-ZFS) and you will not see it even with the option ''-a'' given to a '''ls''' command unless you specify it. It is a contradiction to Unix and Unix-like systems' philosophy to not hide anything to the system administrator. It is not a bug of ZFS On Linux implementation and the Solaris implementation of ZFS exposes the exact behavior. So what is inside this little magic box?
LINGUAS="fr"
+
</pre>
+
  
==== /etc/conf.d/hwclock ====
 
If you dual-boot with Windows, you'll need to edit this file and change '''clock''' to '''local''', because Windows will set your hardware clock to local time every time you boot Windows. Otherwise you normally wouldn't need to edit this file.
 
 
<console>
 
<console>
(chroot) # ##i##nano -w /etc/conf.d/hwclock
+
###i## cd /myfirstpool/mysecondDS
 +
###i## ls -la | grep .zfs       
 +
###i## ls -lad .zfs             
 +
dr-xr-xr-x 1 root root 0 Mar  2 15:26 .zfs
 +
</console>
 +
<console>
 +
###i## cd .zfs
 +
###i## pwd
 +
/myfirstpool/mysecondDS/.zfs
 +
###i## ls -la
 +
total 4
 +
dr-xr-xr-x 1 root root  0 Mar  2 15:26 .
 +
drwxr-xr-x 3 root root 145 Mar  2 19:29 ..
 +
dr-xr-xr-x 2 root root  2 Mar  2 19:47 shares
 +
dr-xr-xr-x 2 root root  2 Mar  2 18:46 snapshot
 
</console>
 
</console>
  
==== Localization ====
+
We will focus on the ''snapshot'' directory and since we did not dropped the ''Charlie'' snapshot (yet) let's see what lies there:
  
By default, Funtoo Linux is configured with Unicode (UTF-8) enabled, and for the US English locale and keyboard. If you would like to configure your system to use a non-English locale or keyboard, see [[Funtoo Linux Localization]].
+
<console>
 +
###i## cd snapshot
 +
###i## ls -l
 +
total 0
 +
dr-xr-xr-x 1 root root 0 Mar  2 20:16 Charlie
 +
</console>
  
==== Profiles ====
+
Yes we found Charlie here (also!), the snapshot is seen as regular directory but pay attention to its permissions:
 +
* owning user (root) has read+execute
 +
* owning group (root) has read+execute
 +
* rest of the world has read+execute
  
[[Funtoo 1.0 Profile|Funtoo profiles]] are used to define defaults for Portage specific to your needs. There are 4 basic profile types: arch, build, [[Flavors and Mix-ins|flavor, and mix-ins]]:
+
Did you notice? Not a single ''write'' permission on this directory, the only action any user can do is to enter in the directory and list its contents. This not a bug but the nature of ZFS snapshots: they are read-only stuff at the basis. Next question is naturally: can we change something in it? For that we have to enter inside the ''Charlie'' directory:
  
;arch: typically <tt>x86-32bit</tt> or <tt>x86-64bit</tt>, this defines the processor type and support of your system. This is defined when your stage was built and should not be changed.
+
<console>
;build: defines whether your system is a <tt>current</tt>, <tt>stable</tt> or <tt>experimental</tt> build. <tt>current</tt> systems will have newer packages unmasked than <tt>stable</tt> systems.
+
###i## cd Charlie
;flavor: defines the general type of system, such as <tt>server</tt> or <tt>desktop</tt>, and will set default USE flags appropriate for your needs.
+
###i## ls -la
;mix-ins: define various optional settings that you may be interested in enabling.
+
total 7
 +
drwxr-xr-x  3 root root  3 Mar  2 18:22 .
 +
dr-xr-xr-x  3 root root  3 Mar  2 18:46 ..
 +
drwx------ 170 root root 171 Mar  2 18:36 portage
 +
</console>
  
One arch, build and flavor must be set for each Funtoo Linux system, while mix-ins are optional and you can enable more than one if desired.
+
No surprise here: at the time we took the snapshot, ''myfirstpool/mysecondDS'' held a copy of the portage tree stored in a directory named ''portage''. At first glance this one ''seems'' to be writable for the root user let's try to create a file in it:
  
Remember that profiles can often be inherited. For example, the <tt>desktop</tt> flavor inherits the <tt>workstation</tt> flavor settings, which in turn inherits the <tt>X</tt> and <tt>audio</tt> mix-ins. You can view this by using eselect:
+
<console>
 +
###i## cd portage
 +
###i## touch test
 +
touch: cannot touch ‘test’: Read-only file system
 +
</console>
  
 +
Thing are a bit tricky here: indeed nothing has been mounted (check with the '''mount''' command!), we are walking though a pseudo-directory exposed by ZFS that holds the ''Charlie'' snapshot. ''Pseudo-directory'' because in fact ''.zfs'' had no physical existence even in the ZFS metadata as they exists in the zpool. It is just a convenient way provided by the ZFS kernel modules to walk inside the various snapshots' content. You can see but you cannot touch :-)
 +
 +
=== Backtracking changes between a dataset and its snapshot ===
 +
Is it possible to know what is the difference between a a live dataset and its snapshot? Answer to this question is '''yes''' and the '''zfs''' command will help us in this task. Now we rolled back the ''myfirstpool/mysecondDS'' ZFS dataset back to its original state we have to botch it again:
 
<console>
 
<console>
(chroot) # ##i##eselect profile show
+
###i## cp -a /lib/firmware/radeon/C* /myfirstpool/mysecondDS
Currently set profiles:
+
</console>
    arch: gentoo:funtoo/1.0/linux-gnu/arch/x86-64bit
+
  build: gentoo:funtoo/1.0/linux-gnu/build/current
+
  flavor: gentoo:funtoo/1.0/linux-gnu/flavor/desktop
+
mix-ins: gentoo:funtoo/1.0/linux-gnu/mix-ins/kde
+
  
Automatically enabled profiles:
+
Now inspect the difference between the live ZFS dataset ''myfirstpool/mysecondDS'' and its snasphot Charlie, this is done via '''zfs diff''' and by giving only the snapshot's name (you can inspect the difference between snasphot with that command with a slightly change in parameters):
mix-ins: gentoo:funtoo/1.0/linux-gnu/mix-ins/print
+
mix-ins: gentoo:funtoo/1.0/linux-gnu/mix-ins/X
+
mix-ins: gentoo:funtoo/1.0/linux-gnu/mix-ins/audio
+
mix-ins: gentoo:funtoo/1.0/linux-gnu/mix-ins/dvd
+
mix-ins: gentoo:funtoo/1.0/linux-gnu/mix-ins/media
+
mix-ins: gentoo:funtoo/1.0/linux-gnu/mix-ins/console-extras
+
  
 +
<console>
 +
###i## # zfs diff myfirstpool/mysecondDS@Charlie
 +
M      /myfirstpool/mysecondDS/
 +
+      /myfirstpool/mysecondDS/CAICOS_mc.bin
 +
+      /myfirstpool/mysecondDS/CAICOS_me.bin
 +
+      /myfirstpool/mysecondDS/CAICOS_pfp.bin
 +
+      /myfirstpool/mysecondDS/CAICOS_smc.bin
 +
+      /myfirstpool/mysecondDS/CAYMAN_mc.bin
 +
+      /myfirstpool/mysecondDS/CAYMAN_me.bin
 +
(...)
 +
</console>
  
 +
So do we have here? Two things: First it shows we have changed something in ''/myfirstpool/mysecondDS'' (notice the 'M' for Modified), second it shows the addition of several files (CAICOS_mc.bin, CAICOS_me.bin, CAICOS_pfp.bin...) by putting a plus sign ('+') on their left.
 +
 +
If we botch a bit more ''myfirstpool/mysecondDS'' by removing the file ''/myfirstpool/mysecondDS/portage/sys-libs/glibc/Manifest'' :
 +
 +
<console>
 +
###i## rm /myfirstpool/mysecondDS/portage/sys-libs/glibc/Manifest
 +
###i## zfs diff myfirstpool/mysecondDS@Charlie
 +
M      /myfirstpool/mysecondDS/
 +
M      /myfirstpool/mysecondDS/portage/sys-libs/glibc
 +
-      /myfirstpool/mysecondDS/portage/sys-libs/glibc/Manifest
 +
+      /myfirstpool/mysecondDS/CAICOS_mc.bin
 +
+      /myfirstpool/mysecondDS/CAICOS_me.bin
 +
+      /myfirstpool/mysecondDS/CAICOS_pfp.bin
 +
+      /myfirstpool/mysecondDS/CAICOS_smc.bin
 +
+      /myfirstpool/mysecondDS/CAYMAN_mc.bin
 +
+      /myfirstpool/mysecondDS/CAYMAN_me.bin
 +
(...)
 
</console>
 
</console>
  
To view installed profiles:
+
Obviously deleted content is marked by a minus sign ('-').
 +
 
 +
Now a real butchery:
 
<console>
 
<console>
(chroot) # ##i##eselect profile list
+
###i## rm -rf /myfirstpool/mysecondDS/portage/sys-devel/gcc
 +
###i## zfs diff myfirstpool/mysecondDS@Charlie
 +
# zfs diff myfirstpool/mysecondDS@Charlie           
 +
M      /myfirstpool/mysecondDS/
 +
M      /myfirstpool/mysecondDS/portage/sys-devel
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/files
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/awk
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/awk/fixlafiles.awk
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/awk/fixlafiles.awk-no_gcc_la
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/c89
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/c99
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/gcc-4.6.4-fix-libgcc-s-path-with-vsrl.patch
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/gcc-spec-env.patch
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/gcc-spec-env-r1.patch
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/gcc-4.8.2-fix-cache-detection.patch
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/fix_libtool_files.sh
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/gcc-configure-texinfo.patch
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/gcc-4.8.1-bogus-error-with-int.patch
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.3.3-r2.ebuild
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/metadata.xml
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.6.4-r2.ebuild
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.6.4.ebuild
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.8.1-r1.ebuild
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.8.1-r2.ebuild
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.6.2-r1.ebuild
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.8.1-r3.ebuild
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.8.2.ebuild
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.8.1-r4.ebuild
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/Manifest
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.7.3-r1.ebuild
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.8.2-r1.ebuild
 +
M      /myfirstpool/mysecondDS/portage/sys-libs/glibc
 +
-      /myfirstpool/mysecondDS/portage/sys-libs/glibc/Manifest
 +
+      /myfirstpool/mysecondDS/CAICOS_mc.bin
 +
+      /myfirstpool/mysecondDS/CAICOS_me.bin
 +
+      /myfirstpool/mysecondDS/CAICOS_pfp.bin
 +
+      /myfirstpool/mysecondDS/CAICOS_smc.bin
 +
+      /myfirstpool/mysecondDS/CAYMAN_mc.bin
 +
+      /myfirstpool/mysecondDS/CAYMAN_me.bin
 +
(...)
 
</console>
 
</console>
  
To change the profile flavor:
+
No need to explain that digital mayhem! What happens if, in addition, we change the contents of the file ''/myfirstpool/mysecondDS/portage/sys-devel/autoconf/Manifest''?
 
<console>
 
<console>
(chroot) # ##i##eselect profile set-flavor 7
+
###i## zfs diff myfirstpool/mysecondDS@Charlie
 +
M      /myfirstpool/mysecondDS/
 +
M      /myfirstpool/mysecondDS/portage/sys-devel
 +
M      /myfirstpool/mysecondDS/portage/sys-devel/autoconf/Manifest
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/files
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/awk
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/awk/fixlafiles.awk
 +
-      /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/awk/fixlafiles.awk-no_gcc_la
 +
(...)
 
</console>
 
</console>
 +
ZFS shows that the file ''/myfirstpool/mysecondDS/portage/sys-devel/autoconf/Manifest'' has changed. So ZFS can help to track files deletion, creation and modifications. What it does not show is the difference of a file's content between as it exists in a live dataset and this dataset's snapshot. Not a big issue! You can explore a snapshot's content via the ''.zfs'' pseudo-directory and use a command like '''/usr/bin/diff''' to examine the difference with the file as it exists on the corresponding live dataset.
  
To add a mix-in:
+
<console>
 +
###i## diff -u /myfirstpool/mysecondDS/.zfs/snapshot/Charlie/portage/sys-devel/autoconf/Manifest /myfirstpool/mysecondDS/portage/sys-devel/autoconf/Manifest
 +
--- /myfirstpool/mysecondDS/.zfs/snapshot/Charlie/portage/sys-devel/autoconf/Manifest  2013-08-18 08:52:01.742411902 -0400
 +
+++ /myfirstpool/mysecondDS/portage/sys-devel/autoconf/Manifest 2014-03-02 21:36:50.582258990 -0500
 +
@@ -4,7 +4,4 @@
 +
DIST autoconf-2.62.tar.gz 1518427 SHA256 83aa747e6443def0ebd1882509c53f5a2133f50...
 +
DIST autoconf-2.63.tar.gz 1562665 SHA256 b05a6cee81657dd2db86194a6232b895b8b2606a...
 +
DIST autoconf-2.64.tar.bz2 1313833 SHA256 872f4cadf12e7e7c8a2414e047fdff26b517c7...
 +
-DIST autoconf-2.65.tar.bz2 1332522 SHA256 db11944057f3faf229ff5d6ce3fcd819f56545...
 +
-DIST autoconf-2.67.tar.bz2 1369605 SHA256 00ded92074999d26a7137d15bd1d51b8a8ae23...
 +
-DIST autoconf-2.68.tar.bz2 1381988 SHA256 c491fb273fd6d4ca925e26ceed3d177920233c...
 +
DIST autoconf-2.69.tar.xz 1214744 SHA256 64ebcec9f8ac5b2487125a86a7760d2591ac9e1d3...
 +
(...)
 +
</console>
  
 +
=== Dropping a snapshot ===
 +
A snapshot is no more than a dataset frozen in time and thus can be destroyed in the exact same way seen in the paragraphs before. Now we do not need the ''Charlie'' snapshot we can remove it. Simple:
 
<console>
 
<console>
(chroot) # ##i##eselect profile add 10
+
###i## zfs destroy myfirstpool/mysecondDS@Charlie
 +
###i## zfs list -t all
 +
NAME                    USED  AVAIL  REFER  MOUNTPOINT
 +
myfirstpool            1.71G  6.10G  850M  /myfirstpool
 +
myfirstpool/myfirstDS    30K  6.10G    30K  /myfirstpool/myfirstDS
 +
myfirstpool/mysecondDS  903M  6.10G  903M  /myfirstpool/mysecondDS
 
</console>
 
</console>
  
=== Configuring and installing the Linux kernel ===
+
And Charlie is gone forever ;-)
  
Now it's time to build and install a Linux kernel, which is the heart of any Funtoo Linux system. In the past, the process of creating a kernel that actually booted your system could be time-consuming and require a great deal of trial and error. Fortunately, Funtoo Linux offers an option to automatically build a kernel for you that will boot nearly all systems.
+
=== The time travelling machine part 1 : examining differences between snapshots ===
 +
So far we only used a single snapshot just to keep things simple. However a dataset can hold several snapshots and you can do everything seen so far with them like rolling back, destroying them or examining the difference not only between a snapshot and its corresponding live dataset but also between two snapshots. For this part we will consider the ''myfirstpool/myfirstDS'' dataset which should be empty at this point.
  
If you are unfamiliar with how to manually configure your own kernel, or you simply want to get your system up and running quickly, you can emerge <tt>debian-sources</tt> with the <tt>binary</tt> USE flag set, which will automatically build the kernel and an initrd that will boot nearly all Funtoo Linux systems. This kernel is based on a linux-3.2 LTS official debian kernel package and is an easy way to get your system up and running relatively quickly.
+
<console>
 +
# ls -la /myfirstpool/myfirstDS
 +
total 3
 +
drwxr-xr-x 2 root root 2 Mar 2 21:14 .
 +
drwxr-xr-x 5 root root 6 Mar 2 17:58 ..
 +
</console>
  
Click [http://wiki.debian.org/DebianKernel here] for a list of all architectures the Debian kernel supports.  
+
Now let's generate some contents, take a snapshot (snapshot-1), add more content, take a snapshot again (snapshot-2), do some modifications again and take a third snapshot (snapshot-3):
 +
 
 +
<console>
 +
###i## echo "Hello, world" > /myfirstpool/myfirstDS/hello.txt
 +
###i## cp -R /lib/firmware/radeon /myfirstpool/myfirstDS
 +
###i## ls -l /myfirstpool/myfirstDS
 +
total 5
 +
-rw-r--r-- 1 root root 13 Mar 3 06:41 hello.txt
 +
drwxr-xr-x 2 root root 143 Mar 3 06:42 radeon
 +
###i## zfs snapshot myfirstpool/myfirstDS@snapshot-1
 +
</console>
 +
<console>
 +
###i## echo "Goodbye, world" > /myfirstpool/myfirstDS/goodbye.txt
 +
###i## echo "Are you there?" >> /myfirstpool/myfirstDS/hello.txt
 +
###i## cp /proc/config.gz /myfirstpool/myfirstDS
 +
###i## rm /myfirstpool/myfirstDS/radeon/CAYMAN_me.bin
 +
###i## zfs snapshot myfirstpool/myfirstDS@snapshot-2
 +
</console>
 +
<console>
 +
###i## echo "Still there?" >> /myfirstpool/myfirstDS/goodbye.txt
 +
###i## mv /myfirstpool/myfirstDS/hello.txt /myfirstpool/myfirstDS/hello_new.txt
 +
###i## cat /proc/version > /myfirstpool/myfirstDS/version.txt
 +
###i## zfs snapshot myfirstpool/myfirstDS@snapshot-3
 +
</console>
 +
<console>
 +
###i## zfs list -t all
 +
NAME                              USED  AVAIL  REFER  MOUNTPOINT
 +
myfirstpool                      1.81G  6.00G  850M  /myfirstpool
 +
myfirstpool/myfirstDS            3.04M  6.00G  2.97M  /myfirstpool/myfirstDS
 +
myfirstpool/myfirstDS@snapshot-1    47K      -  2.96M  -
 +
myfirstpool/myfirstDS@snapshot-2    30K      -  2.97M  -
 +
myfirstpool/myfirstDS@snapshot-3      0      -  2.97M  -
 +
myfirstpool/mysecondDS            1003M  6.00G  1003M  /myfirstpool/mysecondDS
 +
</console>
  
{{fancyimportant|<tt>debian-sources</tt> with <tt>binary</tt> USE flag requires at least 12GB in /var/tmp}}
+
You saw to how use '''zfs diff''' to compare the difference between a snapshot and its corresponding "live" dataset in the above paragraphs. Doing the same exercise with two snapshots is not that much different as you just have to explicitly tell the command what datasets are to be compared against and the command will oputput the result in the exact same manner.So what are the differences between snapshots ''myfirstpool/myfirstDS@snapshot-1'' and ''myfirstpool/myfirstDS@snapshot-2''? Let's make the '''zfs''' command work for us:
  
 
<console>
 
<console>
(chroot) # ##i##echo "sys-kernel/debian-sources binary" >> /etc/portage/package.use
+
###i## zfs diff myfirstpool/myfirstDS@snapshot-1 myfirstpool/myfirstDS@snapshot-2
(chroot) # ##i##emerge debian-sources</console>
+
M      /myfirstpool/myfirstDS/
 +
M      /myfirstpool/myfirstDS/hello.txt
 +
M      /myfirstpool/myfirstDS/radeon
 +
-       /myfirstpool/myfirstDS/radeon/CAYMAN_me.bin
 +
+      /myfirstpool/myfirstDS/goodbye.txt
 +
+      /myfirstpool/myfirstDS/config.gz
 +
</console>
  
All done!
+
Before digging farther, let's think about what we did between the time we created the first snapshot and the second snapshot:
 +
* We modified the file /myfirstpool/myfirstDS/hello.txt hence the 'M' shown on left of the second line (thus we changed something under ''/myfirstpool/myfirstDS'' hence a 'M' is also shown on the left of the first line)
 +
* We deleted the file ''/myfirstpool/myfirstDS/radeon/CAYMAN_me.bin'' hence the minus sign ('-') shown on the left of the fourth line (and the 'M' shown on left of the third line)
 +
* We added two files which were ''/myfirstpool/myfirstDS/goodbye.txt'' and ''/myfirstpool/myfirstDS/config.gz'' hence the plus sign ('+') shown on the left of the fifth and sixth lines (also this is a change happening in ''/myfirstpool/myfirstDS'' hence another reason to show a 'M' on the left of the first line)
  
{{fancynote|NVIDIA card users: the <tt>binary</tt> USE flag installs the Nouveau drivers which cannot be loaded at the same time as the proprietary drivers, and cannot be unloaded at runtime because of KMS. You need to blacklist it under <tt>/etc/modprobe.d/</tt>.}}
+
Now same exercise this time with snapshots ''myfirstpool/myfirstDS@snapshot-2'' and ''myfirstpool/myfirstDS@snapshot-3'':
{{fancynote|For an overview of other kernel options for Funtoo Linux, see [[Funtoo Linux Kernels]]. There maybe modules that the Debian kernel doesn't include, a situation where [http://www.funtoo.org/wiki/Funtoo_Linux_Kernels#Using_Debian-Sources_with_Genkernel genkernel] would be useful. Also be sure to see [[:Category:Hardware Compatibility|hardware compatibility]] information. We have compiled a very good reference for [[Dell PowerEdge 11G Servers]] that includes kernel compatibility information as well..}}
+
  
 +
<console>
 +
###i## zfs diff myfirstpool/myfirstDS@snapshot-2 myfirstpool/myfirstDS@snapshot-3
 +
M      /myfirstpool/myfirstDS/
 +
R      /myfirstpool/myfirstDS/hello.txt -> /myfirstpool/myfirstDS/hello_new.txt
 +
M      /myfirstpool/myfirstDS/goodbye.txt
 +
+      /myfirstpool/myfirstDS/version.txt
 +
</console>
  
The next step is to configure your boot loader so that your new kernel loads when the system boots.
+
Try to interpret what you see except for the second line where a "R" (standing for "Rename") is shown. ZFS is smart enough to also show both the old the new names!
  
=== Installing a Bootloader ===
+
Why not push the limit and try a few fancy things. First things first: what happens if we tell to compare two snapshots but in a reverse order?
  
==== Installing Grub ====
+
<console>
 +
###i## zfs diff myfirstpool/myfirstDS@snapshot-3 myfirstpool/myfirstDS@snapshot-2
 +
Unable to obtain diffs:
 +
  Not an earlier snapshot from the same fs
 +
</console>
  
The boot loader is responsible for loading the kernel from disk when your computer boots. For new installations, GRUB 2 and Funtoo's boot-update tool should be used as a boot loader. GRUB supports both GPT/GUID and legacy MBR partitioning schemes.
+
Is ZFS would be a bit more happy if we ask the difference between two snapshots this time with a gap in between (so snapshot 1 with snapshot 3):
  
To use this recommended boot method, first emerge <tt>boot-update</tt>. This will also cause <tt>grub-2</tt> to be merged, since it is a dependency of <tt>boot-update</tt>.
+
<console>
 +
###i## zfs diff myfirstpool/myfirstDS@snapshot-1 myfirstpool/myfirstDS@snapshot-3
 +
M      /myfirstpool/myfirstDS/
 +
R      /myfirstpool/myfirstDS/hello.txt -> /myfirstpool/myfirstDS/hello_new.txt
 +
M      /myfirstpool/myfirstDS/radeon
 +
-       /myfirstpool/myfirstDS/radeon/CAYMAN_me.bin
 +
+      /myfirstpool/myfirstDS/goodbye.txt
 +
+      /myfirstpool/myfirstDS/config.gz
 +
+      /myfirstpool/myfirstDS/version.txt
 +
</console>
 +
 
 +
Amazing! Here again, take a couple of minutes to think about all operations you did on the dataset between the time you took the first snapshot and the time you took the last snapshot: this summary is the exact reflect of all your previous operations.
 +
 
 +
Just to put a conclusion on this subject, let's see the differences between the ''myfirstpool/myfirstDS'' dataset and its various snapshots:
  
 
<console>
 
<console>
(chroot) # ##i##emerge boot-update
+
###i## zfs diff myfirstpool/myfirstDS@snapshot-1                               
 +
M      /myfirstpool/myfirstDS/
 +
R      /myfirstpool/myfirstDS/hello.txt -> /myfirstpool/myfirstDS/hello_new.txt
 +
M      /myfirstpool/myfirstDS/radeon
 +
-      /myfirstpool/myfirstDS/radeon/CAYMAN_me.bin
 +
+      /myfirstpool/myfirstDS/goodbye.txt
 +
+      /myfirstpool/myfirstDS/config.gz
 +
+      /myfirstpool/myfirstDS/version.txt
 +
</console>
 +
<console>
 +
###i## zfs diff myfirstpool/myfirstDS@snapshot-2
 +
M      /myfirstpool/myfirstDS/
 +
R      /myfirstpool/myfirstDS/hello.txt -> /myfirstpool/myfirstDS/hello_new.txt
 +
M      /myfirstpool/myfirstDS/goodbye.txt
 +
+      /myfirstpool/myfirstDS/version.txt
 +
</console>
 +
<console>
 +
###i##  zfs diff myfirstpool/myfirstDS@snapshot-3
 
</console>
 
</console>
  
Then, edit <tt>/etc/boot.conf</tt> and specify "<tt>Funtoo Linux genkernel</tt>" as the <tt>default</tt> setting at the top of the file, replacing <tt>"Funtoo Linux"</tt>.  
+
Having nothing reported for the last '''zfs diff''' is normal as changed in the dataset since the snapshot has been taken.
  
<tt>/etc/boot.conf</tt> should now look like this:
+
=== The time travelling machine part 2: rolling back with multiple snapshots ===
 +
Examining the differences between the various snapshots of a dataset or the dataset itself would be quite useless if we would not be able to roll the dataset back to one of its previous states. How we have salvaged ''myfirstpool/myfirstDS'' a bit, it would the time to restore it at it was when the first snapshot had been taken:
 +
 
 +
<console>
 +
###i## zfs rollback myfirstpool/myfirstDS@snapshot-1
 +
cannot rollback to 'myfirstpool/myfirstDS@snapshot-1': more recent snapshots exist
 +
use '-r' to force deletion of the following snapshots:
 +
myfirstpool/myfirstDS@snapshot-3
 +
myfirstpool/myfirstDS@snapshot-2
 +
</console>
 +
 
 +
Err... Well, ZFS just tells us that several more recent snapshots exists and it refuses to proceed without dropping those latter. Unfortunately for us there is no way to circumvent that: once you jump backward you have no way to move forward again. We could demonstrate the rollback to ''myfirstpool/myfirstDS@snapshot-3'' then ''myfirstpool/myfirstDS@snapshot-2'' then ''myfirstpool/myfirstDS@snapshot-1'' but it would be of very little interest previous sections of this tutorial did that already so second attempt:
 +
 
 +
<console>
 +
###i## zfs rollback -r myfirstpool/myfirstDS@snapshot-1
 +
###i## zfs list -t all                                                         
 +
NAME                              USED  AVAIL  REFER  MOUNTPOINT
 +
myfirstpool                      1.81G  6.00G  850M  /myfirstpool
 +
myfirstpool/myfirstDS            2.96M  6.00G  2.96M  /myfirstpool/myfirstDS
 +
myfirstpool/myfirstDS@snapshot-1    1K      -  2.96M  -
 +
myfirstpool/mysecondDS            1003M  6.00G  1003M  /myfirstpool/mysecondDS
 +
</console>
 +
 
 +
''myfirstpool/myfirstDS'' effectively returned to the desired state (notice the size of ''myfirstpool/myfirstDS@snapshot-1'') and the snapshots ''snapshot-2'' and ''snapshot-3'' vanished. Just to convince you:
 +
<console>
 +
###i## zfs diff myfirstpool/myfirstDS@snapshot-1
 +
###i##
 +
</console>
 +
 
 +
No differences at all!
 +
 
 +
=== Snapshots and clones ===
 +
 
 +
=== Streaming ZFS datasets over the network ===
 +
 
 +
You find ZFS snaphots useful? Well, you have seen just a small part of their potential. As a snapshot is a photograph  of what a dataset contains frozen in the time, snapshots can be seen as being no more than a data backup. Like any backup, they must not stay on the local machine but must be put elsewhere and the common good sense tells to keep backups in a safe place, making them travel through a secure channel. By "secure channel" we intend something like a trusted person in your organization whose job consists of bringing a box of tapes off-site in a secure location but we also intend a secure communication channel like an SSH tunnel over two hosts without any human intervention.
 +
 
 +
ZSH designers had the same vision and made possible for a dataset to be able to be sent over a network. How is that possible? Simple: the process involves two peers who can use through a communication channel like the one established by '''netcat''' (OpenSSH supports a similar functionality but with an encrypted communication channel).  For the sake of the demonstration, we will use two Solaris boxes at each end-point.
 +
 
 +
How stream some ZFS bits over the network? Here again, '''zfs''' is the answer. A nifty move from the designers was to use ''stdin'' and ''stdout'' as transmission/reception channels thus allowing great a flexibility in processing the ZFS stream. You can envisage, for instance, to compress your stream then crypt it then encode it in base64 then sign it and so on. It sounds a bit overkill but it is possible and in the general case you can use any tool that swallow the data from ''stdin'' and spit it through ''stdout'' in your plumbing.
 +
 
 +
{{fancynote|The rest of this section has been done entirely on two Solaris 11 machines.}}
 +
 
 +
1. Sender side:
 +
 
 +
<pre>
 +
# zfs create testpool2/zfsstreamtest
 +
# echo 'Hello, world!' > /testpool2/zfsstreamtest/hello.txt
 +
# echo 'Goodbye, world' > /testpool2/zfsstreamtest/goodbye.txt
 +
# zfs snapshot zfs testpool2/zfsstreamtest@s1
 +
# zfs list -t snapshot
 +
NAME                              USED  AVAIL  REFER  MOUNTPOINT
 +
testpool2/zfsstreamtest@s1            0      -    32K          -
 +
</pre>
  
 +
2. Receiver side (the dataset ''zfs-stream-test'' will be created and should not be present):
 
<pre>
 
<pre>
boot {
+
# nc -l -p 7000 | zfs receive testpool/zfs-stream-test
        generate grub
+
</pre>
        default "Funtoo Linux genkernel"
+
        timeout 3
+
}
+
  
"Funtoo Linux" {
+
At this point the receiver is waiting after some data.
        kernel bzImage[-v]
+
        # params += nomodeset
+
}
+
  
"Funtoo Linux genkernel" {
+
3. Sender side:
        kernel kernel[-v]
+
<pre>
        initrd initramfs[-v]
+
# zfs send testpool2/zfsstreamtest@s1 | nc 192.168.aaa.bbb.ccc 7000
        params += real_root=auto
+
</pre>
        # params += nomodeset
+
 
}
+
4. Receiver side:
 +
<pre>
 +
# zfs list -t snapshot
 +
NAME                          USED  AVAIL  REFER
 +
...
 +
testpool2/zfs-stream-test@s1      0      -  46.4K  -
 +
</pre>
 +
 
 +
Note that we did not set an explicit snapshot name in the second step but it could have been possible to choose anything else but the default which is the name of the snapshot sent over the network. In that case the dataset which will contain the snapshot needs to be created first:
 +
<pre>
 +
# nc -l -p 7000 | zfs receive testpool/zfs-stream-test@mysnapshot01
 +
</pre>
 +
 
 +
Once received you would get:
 +
 
 +
<pre>
 +
# zfs list -t snapshot
 +
NAME                                      USED  AVAIL  REFER
 +
...
 +
testpool2/zfs-stream-test@mysnapshot01      0      -  46.4K  -
 +
</pre>
 +
 
 +
5. Just for the sake of the curiosity let's do a rollback on the receiver side:
 +
 
 +
<pre>
 +
# zfs rollback testpool2/zfsstreamtest@s1
 +
# ls -l /testpool2/zfs-stream-test
 +
total 2
 +
-rw-r--r-- 1 root root 15 2011-09-06 23:54 goodbye.txt
 +
-rw-r--r-- 1 root root 13 2011-09-06 23:53 hello.txt
 +
# cat /testpool2/zfs-stream-test/hello.txt
 +
Hello, world
 +
</pre>
 +
 
 +
Because ZFS streaming operates using the starnd input and output (''stdin'' / ''stdout'') you can build a bit more complex pipeline like:
 +
 
 +
<pre>
 +
# zfs send testpool2/zfsstreamtest@s1 | gzip | nc 192.168.aaa.bbb.ccc 7000
 
</pre>
 
</pre>
 
   
 
   
Please read <tt>man boot.conf</tt> for further details.
+
The above example was using two hosts but a simpler setup is also possible: you are not required to send you data over the network with '''netcat''', you can store it to a regular file then mail it or store it on a USB key. By the way: we have not finished! We took only a simple case here: it is absolutely possible to do the exact same operation with the difference between snapshots (incremental). Just like an incremental backup takes only what has changed, ZFS can determine the difference between two snapshots and streaming instead of streaming a snapshot taken at whole. Although ZFS can detect and act on differentials, it does not operate (yet) at the block level: if only a few bytes of a very big file have changed, the whole file will be taken into consideration (operating at data block level is possible with some tools like the well-known '''rsync''').
  
===== Running grub-install and boot-update =====
+
Consider the following:
  
Finally, we will need to actually install the GRUB boot loader to your disk, and also run <tt>boot-update</tt> which will generate your boot loader configuration file:
+
* A dataset snapshot (S1) contains two files:
 +
** A -> 10 MB
 +
** B -> 4 GB
 +
* A bit later some files (named C, D and E) are added to the dataset and another snapshot is (S2) taken. S2 contains:
 +
** A -> 10 MB
 +
** B -> 4 GB
 +
** C -> 3 MB
 +
** D -> 500 KB
 +
** E -> 1GB
  
<console>
+
With a full transfer of S2 A,B,C,D and E would be streamed whereas an incremental transfert (S2-S1), zfs would only process C, D and E. The next $100 question:''"How can we stream a difference of snapshot? '''zfs''' again?"'' Yes! This time with a subtle difference: a special option specified on the command line telling it must use a difference rather than a full snapshot. Assuming a few more files are added in ''testpool2/zfsstreamtest'' dataset and a snapshot (s2) is has been taken, the delta between s2 and s1 (s2-s1) giving s3 can be send like this (on the receiver side the same as shown above is used, nothing special is required alos notice the presence of the -i option):
(chroot) # ##i##grub-install --no-floppy /dev/sda
+
(chroot) # ##i##boot-update
+
</console>
+
  
You only need to run <tt>grub-install</tt> when you first install Funtoo Linux, but you need to re-run <tt>boot-update</tt> every time you modify your <tt>/etc/boot.conf</tt> file, so your changes are applied on next boot.
+
* Sender:
 +
<pre>
 +
# zfs send -i testpool2/zfsstreamtest@s1 testpool2/zfsstreamtest@s2 | nc 192.168.aaa.bbb.ccc 7000
 +
</pre>
  
OK - your system should be ready to boot! Well, there are a few more loose ends...
+
* Receiver:
 +
<pre>
 +
# nc -l -p 7000 | zfs receive testpool/zfs-stream-test
 +
# zfs list -t snapshot
 +
testpool/zfs-stream-test@s1      28.4K      -  46.4K  -
 +
testpool/zfs-stream-test@s2          0      -  47.1K  -
 +
</pre>
  
==== Installing Syslinux/Extlinux ====
+
Note that although we did not specified any snapshot name to use on the receiver side, ZFS used by default the name of the second snapshot involved in the delta (''s2'' here).
  
An alternate boot loader called extlinux can be used instead of GRUB if you desire. See the [[Extlinux|extlinux Guide]] for information on how to do this.
 
  
=== Configuring your network ===
+
$200 question: suppose we delete all of the received snapshots so far on the receiver side and we try to send the difference between s2 and s1, what would happen? ZFS will protest on the receiver side although no error message will be visible on the sender side:
 +
<pre>
 +
cannot receive incremental stream: destination testpool/zfs-stream-test has been modified
 +
since most recent snapshot
 +
</pre>
  
It's important to ensure that you will be able to connect to your local-area network after you reboot into Funtoo Linux. There are three approaches you can use for configuring your network: NetworkManager, dhcpcd, and the [[Funtoo Linux Networking]] scripts. Here's how to choose which one to use based on the type of network you want to set up.
+
It is even worse if we remove the dataset used to receive the data:
  
==== Wi-Fi ====
+
<pre>
 +
cannot receive incremental stream: destination 'testpool/zfs-stream-test' does not exist
 +
</pre>
  
For laptop/mobile systems where you will be using Wi-Fi and connecting to various networks, NetworkManager is strongly recommended. The Funtoo version of NetworkManager is fully functional even from the command-line, so you can use it even without X or without the Network Manager applet. Here are the steps involved in setting up NetworkManager:
+
{{fancyimportant|ZFS streaming over a network has '''no underlying protocol''', therefore the sender just assumes the data has been successfully received and processed. It '''does not care''' whether a processing error occurs.}}
  
<console>
+
== Govern a dataset by attributes ==
# ##i##emerge linux-firmware
+
# ##i##emerge networkmanager
+
# ##i##rc-update add NetworkManager default
+
</console>
+
  
Above, we installed linux-firmware which contains a complete collection of available firmware for many hardware devices including Wi-Fi adapters, plus NetworkManager to manage our network connection. Then we added NetworkManager to the <tt>default</tt> runlevel so it will start when Funtoo Linux boots.
+
So far, most of a filesystem capabilities were driven by separate and scarced command line line tools (e.g. tune2fs, edquota, rquota, quotacheck...) which all have their own ways to handle tasks and can go through tricky ways sometimes especially the quota-related management utilities. Moreover, there was no easy way to handle a limitations on a directory rather than putting it a a dedicated partition or logical volume implying downtimes when additional space was to be added. Quota management is however one of the many facets disk space management includes.  
  
After you reboot into Funtoo Linux, you will be able to add a Wi-Fi connection this way:
+
In the ZFS world, many aspects are now managed by simply setting/clearing a property attached to a ZFS dataset through the now so well-known command '''zfs'''.You can, for example:
  
<console>
+
* put a size limit on a dataset
# ##i##addwifi -S wpa -K 'wifipassword' mywifinetwork
+
* reserve a space for dataset (that space is ''guaranteed'' to be available in the future although not being allocated at the time the reservation is made)
</console>
+
* control if new files are encrypted and/or compressed
 +
* define a quota per user or group of users
 +
* control checksum usage  => '''never turn that property off unless having very good reasons you are likely to never have''' (no checksums = no silent data corruption detection)
 +
* share a dataset by NFS/CIFS
 +
* control automatic data deduplication
  
The <tt>addwifi</tt> command is used to configure and connect to a WPA/WPA2 Wi-Fi network named <tt>mywifinetwork</tt> with the password <tt>wifipassword</tt>. This network configuration entry is stored in <tt>/etc/NetworkManager/system-connections</tt> so that it will be remembered in the future. You should only need to enter this command once for each Wi-Fi network you connect to.
+
Not all of a dataset properties are settable, some of them are set and managed by the operating system in the background for you and thus cannot be modified.
  
==== Desktop (Wired Ethernet) ====
+
{{fancynote|Solaris/OpenIndiana users: ZFS has a tight integration with the NFS/CIFS server, thus it is possible to share a zfs dataset by setting adequate attributes. ZFS on Linux (native kernel mode port) also has a tight integration with the built-in Linux NFS server, the same for ZFS fuse although still experimental. Under FreeBSD ZFS integration has been done both with NFS and Samba (CIFS).}}
  
For a home desktop or workstation with wired Ethernet that will use DHCP, the simplest and most effective option to enable network connectivity is to simply add <tt>dhcpcd</tt> to the default runlevel:
+
Like any other action concerning datasets, properties are sets and unset via the zfs command. On our Funtoo box running zfs-Fuse we can, for example, start by seeing the value of all properties for the dataset ''myfirstpool/myfirstDS'':
  
<console>
+
<pre>
# ##i##rc-update add dhcpcd default
+
# zfs get all myfirstpool/myfirstDS
</console>
+
zfs get all myfirstpool/myfirstDS
 +
NAME                  PROPERTY              VALUE                  SOURCE
 +
myfirstpool/myfirstDS  type                  filesystem              -
 +
myfirstpool/myfirstDS  creation              Sun Sep  4 23:34 2011  -
 +
myfirstpool/myfirstDS  used                  73.8M                  -
 +
myfirstpool/myfirstDS  available            5.47G                  -
 +
myfirstpool/myfirstDS  referenced            73.8M                  -
 +
myfirstpool/myfirstDS  compressratio        1.00x                  -
 +
myfirstpool/myfirstDS  mounted              yes                    -
 +
myfirstpool/myfirstDS  quota                none                    default
 +
myfirstpool/myfirstDS  reservation          none                    default
 +
myfirstpool/myfirstDS  recordsize            128K                    default
 +
myfirstpool/myfirstDS  mountpoint            /myfirstpool/myfirstDS  default
 +
myfirstpool/myfirstDS  sharenfs              off                    default
 +
myfirstpool/myfirstDS  checksum              on                      default
 +
myfirstpool/myfirstDS  compression          off                    default
 +
myfirstpool/myfirstDS  atime                on                      default
 +
myfirstpool/myfirstDS  devices              on                      default
 +
myfirstpool/myfirstDS  exec                  on                      default
 +
myfirstpool/myfirstDS  setuid                on                      default
 +
myfirstpool/myfirstDS  readonly              off                    default
 +
myfirstpool/myfirstDS  zoned                off                    default
 +
myfirstpool/myfirstDS  snapdir              hidden                  default
 +
myfirstpool/myfirstDS  aclmode              groupmask              default
 +
myfirstpool/myfirstDS  aclinherit            restricted              default
 +
myfirstpool/myfirstDS  canmount              on                      default
 +
myfirstpool/myfirstDS  xattr                on                      default
 +
myfirstpool/myfirstDS  copies                1                      default
 +
myfirstpool/myfirstDS  version              4                      -
 +
myfirstpool/myfirstDS  utf8only              off                    -
 +
myfirstpool/myfirstDS  normalization        none                    -
 +
myfirstpool/myfirstDS  casesensitivity      sensitive              -
 +
myfirstpool/myfirstDS  vscan                off                    default
 +
myfirstpool/myfirstDS  nbmand                off                    default
 +
myfirstpool/myfirstDS  sharesmb              off                    default
 +
myfirstpool/myfirstDS  refquota              none                    default
 +
myfirstpool/myfirstDS  refreservation        none                    default
 +
myfirstpool/myfirstDS  primarycache          all                    default
 +
myfirstpool/myfirstDS  secondarycache        all                    default
 +
myfirstpool/myfirstDS  usedbysnapshots      18K                    -
 +
myfirstpool/myfirstDS  usedbydataset        73.8M                  -
 +
myfirstpool/myfirstDS  usedbychildren        0                      -
 +
myfirstpool/myfirstDS  usedbyrefreservation  0                      -
 +
myfirstpool/myfirstDS  logbias              latency                default
 +
myfirstpool/myfirstDS  dedup                off                    default
 +
myfirstpool/myfirstDS  mlslabel              off                    -
 +
</pre>
  
When you reboot, <tt>dhcpcd</tt> will run in the background and manage all network interfaces and use DHCP to acquire network addresses from a DHCP server.
+
How can we set a limit that prevents ''myfirstpool/myfirstDS'' to not use more than 1 GB of space in the pool? Simple, just set the ''quota'' property:
  
==== Server (Static IP) ====
+
<pre>
 +
# zfs set quota=1G myfirstpool/myfirstDS
 +
# zfs get quota myfirstpool/myfirstDS
 +
NAME                  PROPERTY  VALUE  SOURCE
 +
myfirstpool/myfirstDS  quota    1G    local
 +
</pre>
  
For servers, the [[Funtoo Linux Networking]] scripts are recommended. They are optimized for static configurations and things like virtual ethernet bridging for virtualization setups. See [[Funtoo Linux Networking]] for information on how to use Funtoo Linux's template-based network configuration system.
+
May be something poked your curiosity: ''what "SOURCE" means?'' "SOURCE" describes how the property has been determined for the dataset and can have several values:
 +
* '''local''': the property has been explicitly set for this dataset
 +
* '''default''': a default value has been assigned by the operating system if not explicitely set by the system adminsitrator (e.g SUID allowed or not in the above example).
 +
* '''dash (-)''': not modifiable intrinsic property (e.g. dataset creation time, whether the dataset is currently mounted or not, dataset space usage in the pool, average compression ratio...)
  
=== Finishing Steps ===
+
Before copying some files in the dataset, let's fix a binary (on/off) property:
 +
<pre>
 +
# zfs set compression=on myfirstpool/myfirstDS
 +
</pre>
  
==== Set your root password ====
+
Now try to put more than 1GB of data in the dataset:
It's imperative that you set your root password before rebooting so that you can log in.
+
<console>
+
(chroot) # ##i##passwd
+
</console>
+
  
===Restart your system ===
+
<pre>
 +
# dd if=/dev/zero of=/myfirstpool/myfirstDS/one-GB-test bs=2G count=1
 +
dd: writing `/myfirstpool/myfirstDS/one-GB-test': Disk quota exceeded
 +
</pre>
  
Now is the time to leave chroot, to unmount Funtoo Linux partitions and files and to restart your computer. When you restart, the GRUB boot loader will start, load the Linux kernel and initramfs, and your system will begin booting.
+
== Permission delegation ==
  
Leave the chroot, change directory to /, unmount your Funtoo partitions, and reboot.
+
ZFS brings a feature known as delegated administration. Delegated administration enables ordinary users to handle administrative tasks on a dataset without being administrators. '''It is however not a sudo replacement as it covers only ZFS related tasks''' such as sharing/unsharing, disk quota management and so on. Permission delegation shines in flexibility because such delegation can be handled by inheritance though nested datasets. Pewrmission deleguation is handled via '''zfs''' through its '''allow''' and '''disallow''' options.
<console>
+
(chroot) # ##i##exit
+
# ##i##cd /
+
# ##i##umount -l /mnt/funtoo/boot /mnt/funtoo/dev /mnt/funtoo/proc /mnt/funtoo/sys /mnt/funtoo
+
# ##i##reboot
+
</console>
+
  
You should now see your system reboot, the GRUB boot loader appear for a few seconds, and then see the Linux kernel and initramfs loading. After this, you should see Funtoo Linux itself start to boot, and you should be greeted with a <tt>login:</tt> prompt. Funtoo Linux has been successfully installed!
+
= Data redundancy with ZFS =
  
===Next Steps===
+
Nothing is perfect and the storage medium (even in datacenter-class equipment) is prone to failures and fails on a regular basis. Having data redundancy is mandatory to help in preventing single-points of failure (SPoF). Over the past decades, RAID technologies were powerful however their power is precisely their weakness: as operating at the block level, they do not care about what is stored on the data blocks and have no ways to interact with the filesystems stored on them to ensure data integrity is properly handled.
  
If you are brand new to Funtoo Linux and Gentoo Linux, please check out [[Funtoo Linux First Steps]], which will help get you acquainted with your new system.
+
== Some statistics ==
 +
 
 +
It is not a secret to tell that a general trend in the IT industry is the exponential growth of data quantities. Just thinking about the amount of data Youtube, Google or Facebook generates every day taking the case of the first [http://www.website-monitoring.com/blog/2010/05/17/youtube-facts-and-figures-history-statistics some statistics] gives:
 +
* 24 hours of video is generated every ''minute'' in March 2010 (May 2009 - 20h / October 2008 - 15h / May 2008 - 13h)
 +
* More than 2 ''billions'' views a day
 +
* More video is produced on Youtube every 60 days than 3 major US broadcasting networks did in the last 60 years
 +
 
 +
Facebook is also impressive (Facebook own stats):
 +
 
 +
* over 900 million objects that people interact with (pages, groups, events and community pages)
 +
* Average user creates 90 pieces of content each month (750 millions users active)
 +
* More than 2.5 million websites have integrated with Facebook
 +
 
 +
What is true with Facebook and Youtube is also true with many other cases (think one minutes about the amount of data stored in iTunes) especially with the growing popularity of cloud computing infrastructures. Despite the progress of the technology a "bottleneck" still exists: the storage reliability is nearly the same over the years. If only one organization in the world generate huge quantities of data it would be the [http://public.web.cern.ch CERN] (''Conseil Européen pour la Recherche Nucléaire'', now officially known as ''European Organization for Nuclear Research'') as their experiments can generate spikes of many terabytes of data within a few seconds. A study done in 2007 quoted by a [http://www.zdnet.com/blog/storage/data-corruption-is-worse-than-you-know/191 ZDNet article] reveals that:
 +
 
 +
* Even ECC memory cannot be always be helpful: 3 double-bit errors (uncorrectable) occurred in 3 months on 1300 nodes. Bad news: it should be '''zero'''.
 +
* RAID systems cannot protect in all cases: monitoring 492 RAID controller for 4 weeks showed an average error rate of 1 per ~10^14 bits, giving roughly 300 errors for every 2.4 petabytes
 +
* Magnetic storage is still not reliable even on high-end datacenter class drives: 500 errors found over 100 nodes while writing 2 GB file to 3000+ nodes every 2 hours then read it again and again for 5 weeks.
 +
 
 +
Overall this means: 22 corrupted files (1 in every 1500 files) for a grand total of 33700 files holding 8.7TB of data. And this study is 5 years old....
 +
 
 +
== Source of silent data corruption ==
 +
 
 +
http://www.zdnet.com/blog/storage/50-ways-to-lose-your-data/168
 +
 
 +
Not an exhaustive list but we can quote:
 +
 
 +
* Cheap controller or buggy driver that does not reports errors/pre-failure conditions to the operating system;
 +
* "bit-leaking": an harddrive consists of many concentric magnetic tracks. When the hard drive magnetic head writes bits on the magnetic surface it generates a very weak magnetic field however sufficient to "leak" on the next track and change some bits. Drives can generally, compensate those situations because they also records some error correction data on the magnetic surface
 +
* magnetic surface defects (weak sectors)
 +
* Hard drives firmware bugs
 +
* Cosmic rays hitting your RAM chips or hard drives cache memory/electronics
 +
*
 +
 
 +
== Building a mirrored pool ==
 +
 
 +
 
 +
== ZFS RAID-Z ==
 +
 
 +
=== ZFS/RAID-Z vs RAID-5 ===
 +
 
 +
RAID-5 is very commonly used nowadays because of its simplicity, efficiency and fault-tolerance. Although the technology did its proof over decades, it has a major drawback known as "The RAID-5 write hole". if you are familiar with RAID-5 you already know that is consists of spreading the stripes across all of the disks within the array and interleaving them with a special stripe called the parity. Several schemes of spreading stripes/parity between disks exists in the natures, each one with its own pros and cons, however the "standard" one (also known as ''left-asynchronous'') is:
 +
 
 +
<pre>
 +
Disk_0  | Disk_1  | Disk_2  | Disk_3
 +
[D0_S0] | [D0_S1] | [D0_S2] | [D0_P]
 +
[D1_S0] | [D1_S1] | [D1_P]  | [D1_S2]
 +
[D2_S0] | [D2_P]  | [D2_S1] | [D2_S2]
 +
[D2_P]  | [D2_S0] | [D2_S1] | [D2_S2]
 +
</pre>
 +
 
 +
The parity is simply computed by XORing the stripes of the same "row", thus giving the general equation:
 +
* [Dn_S0] XOR [Dn_S1] XOR ... XOR [Dn_Sm] XOR [Dn_P] = 0
 +
This equation can be rewritten in several ways:
 +
* [Dn_S0] XOR [Dn_S1] XOR ... XOR [Dn_Sm] = [Dn_P]
 +
* [Dn_S1] XOR [Dn_S2] XOR ... XOR [Dn_Sm] XOR [Dn_P] = [Dn_S0]
 +
* [Dn_S0] XOR [Dn_S2] XOR ... XOR [Dn_Sm] XOR [Dn_P] = [Dn_S1]
 +
* ...and so on!
 +
 
 +
Because the equations are a combinations of exclusive-or, it is  possible to easily compute a parameter if it is missing. Let say we have 3 stripes plus one parity composed of 4 bits each but one of them is missing due to a disk failure:
 +
 
 +
* D0_S0 = 1011
 +
* D0_S1 = 0010
 +
* D0_S2 = <missing>
 +
* D0_P  = 0110
 +
 
 +
However we know that:
 +
* D0_S0 XOR D0_S1 XOR D0_S2 XOR D0_P = 0000 also rewritten as:
 +
* D0_S2 = D0_S1 XOR D0_S2 XOR D0_P
 +
 
 +
Applying boolean algebra it gives:''' D0_S2 = 1011 XOR 0010 XOR 0110 = 1111'''.
 +
Proof: '''1011 XOR 0010 XOR 1111 = 0110''' this is the same as '''D0_P'''
 +
 
 +
''''''So what's the deal?''''''
 +
Okay now the funny part, forgot the above hypothesis and imagine we have this:
 +
 
 +
* D0_S0 = 1011
 +
* D0_S1 = 0010
 +
* D0_S2 = 1101
 +
* D0_P  = 0110
 +
 
 +
Applying boolean algebra magics gives 1011 XOR 0010 XOR 1101 => 0100. Problem: this is different of D0_P  (0110). Can you tell which one (or which ONES) of the four terms lies? If you find a mathematically acceptable solution, found your company because you have just solved a big computer science problem. If humans can't solve the question, imagine how hard it is for the poor little RAID-5 controller to determine which stripe is right and which one lies and the resulting "datageddon" (i.e. massive data corruption on the RAID-5 array) when the RAID-5 controller detect error and start to rebuild the array.
 +
 
 +
This is not science fiction, this a pure reality and the weakness stays in the RAID-5 simplicity. Here is how it can happen: an urban legend with RAID-5 arrays is that they update stripes in an atomic transaction (all of the stripes+parity are written or none of them). Too bad, this is just not true, the data is written on the fly and if for a reason or another the machine where the RAID-5 array has a power outage or crash, the RAID-5 controller will simply have no idea about what he was doing and which stripes are up to date which ones are not up to date. Of course, RAID controllers in servers do have a replaceable on-board battery and most of the time the server they reside in is connected to an auxiliary source like a battery-based UPS or a diesel/gas electricity generator. However, Murphy laws or unpredictable hazards can, sometimes, happens....
 +
 
 +
Another funny scenario: imagine a machine with a RAID-5 array (on UPS this time) but with non ECC memory. the RAID-5 controller splits the data buffer in stripes, computes a data stripe and starts to write them on the different disks of the array. But...but...but... For some odd reason, only one bit in one of the stripes flips (cosmic rays, RFI...) after the parity calculation. Too bad too sad, one of the written stripes contains corrupted data and it is silently written on the array. Datageddon in sight!
 +
 
 +
Not to make you freaking: storage units have sophisticated error correction capability (a magnetic surface or an optical recording surface is not perfect and reading/writing error occurs) masking most the cases. However, some  established statistics estimates that even with error correction mechanism one bit over 10^16 bits transferred is incorrect. 10^16 is really huge but unfortunately in this beginning of the XXIst century with datacenters brewing massive amounts of data with several hundreds to not say thousands servers this this number starts to give headaches:  '''a big datacenter can face to silent data corruption every 15 minutes''' (Wikepedia). No typo here, a potential disaster may silently appear 5 times an hour for every single day of the year. Detection techniques exists but traditional RAID-5 arrays in them selves can be a problem. Ironic for a so popular and widely used solution :)
 +
 
 +
If RAID-5 was an acceptable trade-off in the past decades, it simply made its time.  RAID-5 is dead? '''*Horray!*'''
 +
 
 +
= More advanced topics =
 +
 
 +
== ZFS Intention Log (ZIL) ==
 +
 
 +
= Final words and lessons learned =
 +
 +
ZFS surpasses by far (as of September 2011) every of the well-known filesystems around there: none of them propose such an integration of features and certainly not with this management simplicity and robustness. However in the Linux world it is definitely a no-go in the short term especially for production systems. The two known implementations are not ready for production environments  and lacks some important features or behave in a clunky manner, this is absolutely correct as none of them pretend to be at this level of maturity and the licensing incompatibility between the code opened by Sun Microsystems some years ago and the GNU/GPL does not help the cause. However, both look '''very promising''' once their corners will become rounded.  
  
We also have a number of pages dedicated to setting up your system, which you can find below. If you are interested in adding a page to this list, add it to the "First Steps" MediaWiki category.
+
For a Linux system, the nearest plan B is you seek for a BTRFS like filesystem covering some of the functionalities offered by ZFS is BTRFS (still considered as experimental, be prepared to a disaster sooner or later although BTRFS is used by some Funtoo core team members since 2 years and proved to be quite stable in practise). BTRFS however does not pushes the limits as much as ZFS does: it does not have built-in snapshot differentiation tool nor implement built-in filesystem streaming capabilities and roll-backing a BTRFS subvolume is a bit more manual than in ''"the ZFS way of life"''.
  
{{#ask: [[Category:First Steps]] | format=category }}
 
  
If your system did not boot correctly, see [[Installation Troubleshooting]] for steps you can take to resolve the problem.
+
= Footnotes & references =
 +
Source: [http://docs.huihoo.com/opensolaris/solaris-zfs-administration-guide/html/index.html solaris-zfs-administration-guide]
 +
[[Category:Labs]]
 +
[[Category:Articles]]
 +
[[Category:Filesystems]]
  
[[Category:HOWTO]]
+
<references/>
[[Category:Install]]
+

Revision as of 03:38, 4 March 2014

Important: This tutorial is under a heavy revision to be switched from ZFS Fuse to ZFS on Linux.

Contents

Introduction

ZFS features and limitations

ZFS offers an impressive amount of features even putting aside its hybrid nature (both a filesystem and a volume manager -- zvol) covered in detail on Wikipedia. One of the most fundamental points to keep in mind about ZFS is it targets a legendary reliability in terms of preserving data integrity. ZFS uses several techniques to detect and repair (self-healing) corrupted data. Simply speaking it makes an aggressive use of checksums and relies on data redundancy, the price to pay is a bit more CPU processing power. However, the Wikipedia article about ZFS also mention it is strongly discouraged to use ZFS over classic RAID arrays as it can not control the data redundancy, thus ruining most of its benefits.

In short, ZFS has the following features (not exhaustive):

  • Storage pool dividable in one or more logical storage entities.
  • Plenty of space:
    • 256 zettabytes per storage pool (2^64 storages pools max in a system).
    • 16 exabytes max for a single file
    • 2^48 entries max per directory
  • Virtual block-devices support support over a ZFS pool (zvol) - (extremely cool when jointly used over a RAID-Z volume)
  • Read-only Snapshot support (it is possible to get a read-write copy of them, those are named clones)
  • Encryption support (supported only at ZFS version 30 and upper, ZFS version 31 is shipped with Oracle Solaris 11 so that version is mandatory if you plan to encrypt your ZFS datasets/pools)
  • Built-in RAID-5-like-over-steroid capabilities known as RAID-Z and RAID-6-like-over-steroid capabilities known as RAID-Z2. RAID-Z3 (triple parity) also exists.
  • Copy-on-Write transactional filesystem
  • Meta-attributes support (properties) allowing you to you easily drive the show like "That directory is encrypted", "that directory is limited to 5GiB", "That directory is exported via NFS" and so on. Depending on what you define, ZFS takes the appropriates actions!
  • Dynamic striping to optimize data throughput
  • Variable block length
  • Data deduplication
  • Automatic pool re-silvering
  • Transparent data compression
  • Transparent encryption (Solaris 11 and later only)

Most notable limitations are:

  • Lack a features ZFS developers knows as "Block Pointer rewrite functionality" (planned to be developed), without it ZFS suffers of currently not being able to:
    • Pool defragmentation (COW techniques used in ZFS mitigates the problem)
    • Pool resizing
    • Data compression (re-applying)
    • Adding an additional device in a RAID-Z/Z2/Z3 pool to increase it size (however, it is possible to replace in sequence each one of the disks composing a RAID-Z/Z2/Z3)
  • NOT A CLUSTERED FILESYSTEM like Lustre, GFS or OCFS2
  • No data healing if used on a single device (corruption can still be detected), workaround if to force a data duplication on the drive
  • No support of TRIMming (SSD devices)

ZFS on well known operating systems

Linux

Despite the source code of ZFS is open, its license (Sun CDDL) is incompatible with the license governing the Linux kernel (GNU GPL v2) thus preventing its direct integration. However a couple of ports exists, but suffers of maturity and lack of features. As of writing (February 2014) two known implementations exists:

  • ZFS-fuse: a totally userland implementation relying on FUSE. This implementation can now be considered as defunct as of February 2014). The original site of ZFS FUSE seems to have disappeared nevertheless the source code is still available on http://freecode.com/projects/zfs-fuse. ZFS FUSE stalled at version 0.7.0 in 2011 and never really evolved since then.
  • ZFS on Linux: a kernel mode implementation of ZFS in kernel mode which supports a lot of NFS features. The implementation is not as complete as it is under Solaris and its siblings like OpenIndiana (e.g. SMB integration is still missing, no encryption support...) but a lot of functionality is there. This is the implementation used for this article. As ZFS on Linux is an out-of-tree Linux kernel implementation, patches must be waited after each Linux kernel release. ZfsOnLinux currently supports zpools version 28.

Solaris/OpenIndiana

  • Oracle Solaris: remains the de facto reference platform for ZFS implementation: ZFS on this platform is now considered as mature and usable on production systems. Solaris 11 uses ZFS even for its "system" pool (aka rpool). A great advantage of this: it is now quite easy to revert the effect of a patch at the condition a snapshot has been taken just before applying it. In the "old good" times of Solaris 10 and before, reverting a patch was possible but could be tricky and complex when possible. ZFS is far from being new in Solaris as it takes its roots in 2005 to be, then, integrated in Solaris 10 6/06 introduced in June 2006.
  • OpenIndiana: is based on the Illuminos kernel (a derivative of the now defunct OpenSolaris) which aims to provide absolute binary compatibility with Sun/Oracle Solaris. Worth mentioning that Solaris kernel and the Illumos kernel were both sharing the same code base, however, they now follows a different path since Oracle announced the discontinuation of OpenSolaris (August 13th 2010). Like Oracle Solaris, OpenIndiana uses ZFS for its system pool. The illumos kernel ZFS support lags a bit behind Oracle: it supports zpool version 28 where as Oracle Solaris 11 has zpool version 31 support, data encryption being supported at zpool version 30.

*BSD

  • FreeBSD: ZFS is present in FreeBSD since FreeBSD 7 (zpool version 6) and FreeBSD can boot on a ZFS volume (zfsboot). ZFS support has been vastly enhanced in FreeBSD 8.x (8.2 supports zpool version 15, version 8.3 supports version 28), FreeBSD 9 and FreeBSD 10 (both supports zpool version 28). ZFS in FreeBSD is now considered as fully functional and mature. FreeBSD derivatives such as the popular FreeNAS takes befenits of ZFS and integrated it in their tools. In the case of that latter, it have, for example, supports for zvol though its Web management interface (FreeNAS >= 8.0.1).
  • NetBSD: ZFS has been started to be ported as a GSoC project in 2007 and is present in the NetBSD mainstream since 2009 (zpool version 13).
  • OpenBSD: No ZFS support yet and not planned until Oracle changes some policies according to the project FAQ.

ZFS alternatives

  • WAFL seems to have severe limitation [1] (document is not dated), also an interesting article lies here
  • BTRFS is advancing every week but it still lacks such features like the capability of emulating a virtual block device over a storage pool (zvol) and built-in support for RAID-5/6 is not complete yet (cf. Btrfs mailing list). At date of writing, it is still experimental where as ZFS is used on big production servers.
  • VxFS has also been targeted by comparisons like this one (a bit controversial). VxFS has been known in the industry since 1993 and is known for its legendary flexibility. Symantec acquired VxFS and proposed a basic version (no clustering for example) of it under the same Veritas Storage Foundation Basic
  • An interesting discussion about modern filesystems can be found on OSNews.com

ZFS vs BTRFS at a glance

Some key features in no particular order of importance between ZFS and BTRFS:

Feature ZFS BTRFS Remarks
Transactional filesystem YES YES
Journaling NO YES Not a design flaw, but ZFS is robust by design... See page 7 of "ZFS The last word on filesystems".
Dividable pool of data storage YES YES
Read-only snapshot support YES YES
Writable snapshot support YES YES
Sending/Receiving a snapshot over the network YES YES
Rollback capabilities YES YES While ZFS knows where and how to rollback the data (on-line), BTRFS requires a bit more work from the system administrator (off-line).
Virtual block-device emulation YES NO
Data deduplication YES YES Built-in in ZFS, third party tool (bedup) in BTRFS
Data blocks reoptimization NO YES ZFS is missing a "block pointer rewrite functionality", true on all known implementations so far. Not a major performance crippling however. BTRFS can do on-line data defragmentation.
Built-in data redundancy support YES YES ZFS has a sort of RAID-5/6 (but better! RAID-Z{1,2,3}) capability, BTRFS only fully supports data mirroring at this point, however some works remains to be done on parity bits handling by BTRFS.
Management by attributes YES NO Nearly everything touching ZFS management is related to attributes manipulation (quotas, sharing over NFS, encryption, compression...), BTRFS also retain the concept but it les less aggressively used.
Production quality code NO NO ZFS support in Linux is not considered as production quality (yet) although it is very robust. Several operating systems like Solaris/OpenIndiana have a production quality implementation, Solaris/OpenIndiana is now installed in ZFS datasets by defaults.
Integrated within the Linux kernel tree NO YES ZFS is released under the CDDL license...

ZFS resource naming restrictions

Before going further, you must be aware of restrictions concerning the names you can use on a ZFS filesystem. The general rule is: you can can use all of the alphanumeric characters plus the following specials are allowed:

  • Underscore (_)
  • Hyphen (-)
  • Colon (:)
  • Period (.)

The name used to designate a ZFS pool has no particular restriction except:

  • it can't use one the reserved words in particular:
    • mirror
    • raidz (raidz2, raidz3 and so on)
    • spare
    • cache
    • log
  • names must begin with an alphanumeric character (same for ZFS datasets).

Some ZFS concepts

Once again with no particular order of importance:

ZFS What it is Counterparts examples
zpool A group of one or many physical storage media (hard drive partition, file...). A zpool has to be divided in at least one ZFS dataset or at least one zvol to hold any data. Several zpools can coexists in a system at the condition they each hold a unique name. Also note that zpools can never be mounted, the only things that can are the ZFS datasets they hold.
  • Volume group (VG) in LVM
  • BTRFS volumes
dataset A logical subdivision of a zpool mounted in your host's VFS where your files and directories resides. Several uniquely named ZFS datasets can coexist in a single system at the conditions they each own a unique name within their zpool.
  • Logical subvolumes (LV) in LVM formatted with a filesystem like ext3.
  • BTRFS subvolumes
snapshot A read-only photo of a ZFS dataset state as is taken at a precise moment of time. ZFS has no way to cooperate on its own with applications that read and write data on ZFS datasets, if those latter still hold data at the moment the snapshot is taken, only what has been flushed will be included in the snapshot. Worth mentioning that snapshot do not take diskspace aside of sone metadata at the exact time they are created, they size will grow as more and data blocks (i.e. files) are deleted or changed on their corresponding live ZFS dataset.
  • No direct equivalent in LVM.
  • BTRFS read-only snapshots
clone What is is... A writable physical clone of snapshot
  • LVM snapshots
  • BTRFS snapshots
zvol An emulated block device whose data is hold behind the scene in the zpool the zvol has been created in. No known equivalent even in BTRFS

Your first contact with ZFS

Requirements

  • ZFS userland tools installed (package sys-fs/zfs)
  • ZFS kernel modules built and installed (package sys-fs/zfs-kmod), there is a known issue with kernel 3.13 series see this thread on Funtoo's forum
  • Disk size of 64 Mbytes as a bare minimum (128 Mbytes is the minimum size of a pool). Multiple disk will be simulated through the use of several raw images accessed via the Linux loopback devices.
  • At least 512 MB of RAM

Preparing

Once your have emerged sys-fs/zfs and sys-fs/zfs-kmod you have two options to start using ZFS at this point :

  • Either you start /etc/init.d/zfs (will load all of the zfs kernel modules for you plus a couple of other things)
  • Either you load the zfs kernel modules by hand (will load all of the zfs kernel modules for you)

So :

# rc-service zfs start

Or:

# modprobe zfs
# lsmod | grep zfs
zfs                   874072  0 
zunicode              328120  1 zfs
zavl                   12997  1 zfs
zcommon                35739  1 zfs
znvpair                48570  2 zfs,zcommon
spl                    58011  5 zfs,zavl,zunicode,zcommon,znvpair

Your first ZFS pool

To start with, four raw disks (2 GB each) are created:

# for i in 0 1 2 3; do dd if=/dev/zero of=/tmp/zfs-test-disk0${i}.img bs=2G count=1; done
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB) copied, 40.3722 s, 53.2 MB/s
...

Then let's see what loopback devices are in use and which is the first free:

# losetup -a
# losetup -f
/dev/loop0

In the above example nothing is used and the first available loopback device is /dev/loop0. Now associate all of the disks with a loopback device (/tmp/zfs-test-disk00.img -> /dev/loop/0, /tmp/zfs-test-disk01.img -> /dev/loop/1 and so on):

# for i in 0 1 2 3; do losetup /dev/loop${i} /tmp/zfs-test-disk0${i}.img; done
# losetup -a
/dev/loop0: [000c]:781455 (/tmp/zfs-test-disk00.img)
/dev/loop1: [000c]:806903 (/tmp/zfs-test-disk01.img)
/dev/loop2: [000c]:807274 (/tmp/zfs-test-disk02.img)
/dev/loop3: [000c]:781298 (/tmp/zfs-test-disk03.img)
Note: ZFS literature often names zpools "tank", this is not a requirement you can use whatever name of you choice (as we did here...)

Every story in ZFS takes its root with a the very first ZFS related command you will be in touch with: zpool. zpool as you might guessed manages all ZFS aspects in connection with the physical devices underlying your ZFS storage spaces and the very first task is to use this command to make what is called a pool (if you have used LVM before, volume groups can be seen as a counter part). Basically what you will do here is to tell ZFS to take a collection of physical storage stuff which can take several forms like a hard drive partition, a USB key partition or even a file and consider all of them as a single pool of storage (we will subdivide it in following paragraphs). No black magic here, ZFS will write some metadata on them behind the scene to be able to track which physical device belongs to what pool of storage.

# zpool create myfirstpool /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3

And.. nothing! Nada! The command silently returned but it did something, the next section will explain what.

Your first ZFS dataset

# zpool list
NAME          SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
myfirstpool  7.94G   130K  7.94G     0%  1.00x  ONLINE  -

What does this mean? Several things: First, your zpool is here and has a size of, roughly, 8 Go minus some space eaten by some metadata. Second is is actually usable because the column HEALTH says ONLINE. Other columns are not meaningful for us for the moment just ignore them. If want more crusty details you can use the zpool command like this:

# zpool status
  pool: myfirstpool
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        myfirstpool  ONLINE       0     0     0
          loop0     ONLINE       0     0     0
          loop1     ONLINE       0     0     0
          loop2     ONLINE       0     0     0
          loop3     ONLINE       0     0     0

Information is quite intuitive: your pool is seen as being usable (state is similar to HEALTH) and is composed of several devices each one listed as being in a healthy state ... at least for now because they will be salvaged for demonstration purpose in a later section. For your information the columns READ,WRITE and CKSUM list the number of operation failures on each of the devices respectfully:

  • READ for reading failures. Having a non-zero value is not a good sign... the device is clunky and will soon fail.
  • WRITE for writing failures. Having a non-zero value is not a good sign... the device is clunky and will soon fail.
  • CKSUM for mismatch between the checksum of the data at the time is had been written and how it has been recomputed when read again (yes, ZFS uses checksums in a agressive manner). Having a non-zero value is not a good sign... corruption happened, ZFS will do its best to recover data by its own but this is definitely not a good sign of a healthy system.

Cool! So far so good you have a new 8 Gb usable brand new storage space on you system. Has been mounted somewhere?

# mount | grep myfirstpool
/myfirstpool on /myfirstpool type zfs (rw,xattr)

Remember the tables in the section above? A zpool in itself can never be mounted, never ever. It is just a container where ZFS datasets are created then mounted. So what happened here? Obscure black magic? No, of course not! Indeed a ZFS dataset named after the zpool's name should have been created automatically for us then mounted. Is is true? We will check this shortly. For the moment you will be introduced with the second command you will deal with when using ZFS : zfs. While the zpool command is used with anything related to zpools, the zfs is used to anything related to ZFS datasets (a ZFS dataset always resides in a zpool, always no exception on that).

Note: zfs and zpool commands are the two only ones you will need to remember when dealing with ZFS.

So how can we check what ZFS datasets are currently known by the system? As you might already guessed like this:

# zfs list
NAME          USED  AVAIL  REFER  MOUNTPOINT
myfirstpool   114K  7.81G    30K  /myfirstpool

Tala! The mystery is busted! the zfs command tells us that not only a ZFS dataset named myfirstpool has been created but also it has been mounted in the system's VFS for us. If you check with the df command, you should also see something like this:

# df -h
Filesystem      Size  Used Avail Use% Mounted on
(...)
myfirstpool     7.9G     0  7.9G   0% /myfirstpool

The $100 question:"what to do with this band new ZFS /myfirstpool dataset ?". Copy some files on it of course! We used a Linux kernel source but you can of course use whatever you want:

# cp -a /usr/src/linux-3.13.5-gentoo /myfirstpool
# ln -s /myfirstpool/linux-3.13.5-gentoo /myfirstpool/linux
# ls -lR /myfirstpool
/myfirstpool:
total 3
lrwxrwxrwx  1 root root 32 Mar  2 14:02 linux -> /myfirstpool/linux-3.13.5-gentoo
drwxr-xr-x 25 root root 50 Feb 27 20:35 linux-3.13.5-gentoo

/myfirstpool/linux-3.13.5-gentoo:
total 31689
-rw-r--r--   1 root root    18693 Jan 19 21:40 COPYING
-rw-r--r--   1 root root    95579 Jan 19 21:40 CREDITS
drwxr-xr-x 104 root root      250 Feb 26 07:39 Documentation
-rw-r--r--   1 root root     2536 Jan 19 21:40 Kbuild
-rw-r--r--   1 root root      277 Feb 26 07:39 Kconfig
-rw-r--r--   1 root root   268770 Jan 19 21:40 MAINTAINERS
(...)

A ZFS dataset behaves like any other filesystem: you can create regular files, symbolic links, pipes, special devices nodes, etc. Nothing mystic here.

Now we have some data in the ZFS dataset let's see what various commands report:

# df -h
Filesystem      Size  Used Avail Use% Mounted on
(...)
myfirstpool     7.9G  850M  7.0G  11% /myfirstpool
# zfs list
NAME          USED  AVAIL  REFER  MOUNTPOINT
myfirstpool   850M  6.98G   850M  /myfirstpool
# zpool list
NAME          SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
myfirstpool  7.94G   850M  7.11G    10%  1.00x  ONLINE  -
Note: Notice the various sizes reported by zpool and zfs commands. In this case it is the same however it can differ, this is true especially with zpools mounted in RAID-Z.

Unmounting/remounting a ZFS dataset

Important: Only ZFS datasets can be mounted inside your host's VFS, no exception on that! Zpools cannot be mounted, never, never, never... please pay attention to the terminology and keep things clear by not messing up with terms. We will introduce ZFS snapshots and ZFS clones but those are ZFS datasets at the basis so they can also be mounted and unmounted.


If a ZFS dataset behaves just like any other filesystem, can we unmount it?

# umount /myfirstpool
# mount | grep myfirstpool

No more /myfirstpool the line of sight! So yes, it is possible to unmount a ZFS dataset just like you would do with any other filesystem. Is the ZFS dataset still present on the system even it is unmounted? Let's check:

# zfs list 
NAME          USED  AVAIL  REFER  MOUNTPOINT
myfirstpool   850M  6.98G   850M  /myfirstpool

Hopefully and obviously it is else ZFS would not be very useful. Your next concern would certainly be: "How can we remount it then?" Simple! Like this:

# zfs mount myfirstpool
# mount | grep myfirstpool
myfirstpool on /myfirstpool type zfs (rw,xattr)

The ZFS dataset is back! :-)

Your first contact with ZFS management by attributes or the end of /etc/fstab

At this point you might be curious about how the zfs command know what it has to mount and where is has to mount it. You might be familiar with the following syntax of the mount command that, behind the scenes, scans the file /etc/fstab and mount the specified entry:

# mount /boot

Does /etc/fstab contain something related to our ZFS dataset?

# cat /etc/fstab | grep myfirstpool
#

Doh!!!... Obvisouly nothing there. Another mystery? Sure not! The answer lies in a extremely powerful feature of ZFS: the attributes. Simply speaking: an attribute is named property of a ZFS dataset that holds a value. Attributes govern various aspects of how the datasets are managed like: "Is the data has to be compressed?", "Is the data has to be encrypted?", "Is the data has to be exposed to the rest of the world by NFS or SMB/Samba?" and of course... '"Where the dataset has to be mounted?". The answer to that latter question can be tell by the following command:

# zfs get mountpoint myfirstpool
NAME         PROPERTY    VALUE         SOURCE
myfirstpool  mountpoint  /myfirstpool  default

Bingo! When you remounted the dataset just some paragraphs ago, ZFS automatically inspected the mountpoint attribute and saw this dataset has to be mounted in the directory /myfirstpool.

A step forward with ZFS datasets

So far you were given a quick tour of what ZFS can do for you and it is very important at this point to distinguish a zpool from a ZFS dataset and to call a dataset for what it is (a dataset) and not for what is is not (a zpool). It is a bit confusing and an editorial choice to have choosen a confusing name just to make you familiar with the one and the other.

Creating datasets

Obviously it is possible to have more than one ZFS dataset within a single zpool. Quizz: what command would you use to subdivide a zpool in datasets? zfs or zpool? Stops reading for two seconds and try to figure out this little question. Frankly.

Answer is... zfs! Although you want to operate on the zpool to logically subdivide it in several datasets, you manage datasets at the end thus you will use the zfs command. It is not always easy at the beginning, do not be too worry you will soon get the habit when to use one or the other. Creating a dataset in a zpool is easy: just give to the zfs command the name of the pool you want to divide and the name of the dataset you want to create in it. So let's create three datasets named myfirstDS, mysecondDS and mythirdDS in myfirstpool(observe how we use the zpool and datasets' names) :

# zfs create myfirstpool/myfirstDS
# zfs create myfirstpool/mysecondDS
# zfs create myfirstpool/mythirdDS

What happened? Let's check :

# zfs list
NAME                     USED  AVAIL  REFER  MOUNTPOINT
myfirstpool              850M  6.98G   850M  /myfirstpool
myfirstpool/myfirstDS     30K  6.98G    30K  /myfirstpool/myfirstDS
myfirstpool/mysecondDS    30K  6.98G    30K  /myfirstpool/mysecondDS
myfirstpool/mythirdDS     30K  6.98G    30K  /myfirstpool/mythirdDS

Obviously we have there what we asked. Moreover if we inspect the contents of /myfirstpool we can notice three new directories having the same than just created:

# ls -l /myfirstpool 
total 8
lrwxrwxrwx  1 root root 32 Mar  2 14:02 linux -> /myfirstpool/linux-3.13.5-gentoo
drwxr-xr-x 25 root root 50 Feb 27 20:35 linux-3.13.5-gentoo
drwxr-xr-x  2 root root  2 Mar  2 15:26 myfirstDS
drwxr-xr-x  2 root root  2 Mar  2 15:26 mysecondDS
drwxr-xr-x  2 root root  2 Mar  2 15:26 mythirdDS

No surprise here! As you might have guessed, those three new directories serves as mountpoints:

# mount | grep myfirstpool
myfirstpool on /myfirstpool type zfs (rw,xattr)
myfirstpool/myfirstDS on /myfirstpool/myfirstDS type zfs (rw,xattr)
myfirstpool/mysecondDS on /myfirstpool/mysecondDS type zfs (rw,xattr)
myfirstpool/mythirdDS on /myfirstpool/mythirdDS type zfs (rw,xattr)

As we did before, we can copy some files in the newly created datasets just like they were regular directories:

# cp -a /usr/portage /myfirstpool/mythirdDS
# ls -l /myfirstpool/mythirdDS/*
total 697
drwxr-xr-x   48 root root   49 Aug 18  2013 app-accessibility
drwxr-xr-x  238 root root  239 Jan 10 06:22 app-admin
drwxr-xr-x    4 root root    5 Dec 28 08:54 app-antivirus
drwxr-xr-x  100 root root  101 Feb 26 07:19 app-arch
drwxr-xr-x   42 root root   43 Nov 26 21:24 app-backup
drwxr-xr-x   34 root root   35 Aug 18  2013 app-benchmarks
drwxr-xr-x   66 root root   67 Oct 16 06:39 app-cdr(...)

Nothing really too exciting here, we have file in mythirdDS. A bit more interesting output:

# zfs list
NAME                     USED  AVAIL  REFER  MOUNTPOINT
myfirstpool             1.81G  6.00G   850M  /myfirstpool
myfirstpool/myfirstDS     30K  6.00G    30K  /myfirstpool/myfirstDS
myfirstpool/mysecondDS    30K  6.00G    30K  /myfirstpool/mysecondDS
myfirstpool/mythirdDS   1002M  6.00G  1002M  /myfirstpool/mythirdDS
# df -h
Filesystem              Size  Used Avail Use% Mounted on
(...)
myfirstpool             6.9G  850M  6.1G  13% /myfirstpool
myfirstpool/myfirstDS   6.1G     0  6.1G   0% /myfirstpool/myfirstDS
myfirstpool/mysecondDS  6.1G     0  6.1G   0% /myfirstpool/mysecondDS
myfirstpool/mythirdDS   7.0G 1002M  6.1G  15% /myfirstpool/mythirdDS

Noticed the size given for the 'AVAIL' column? At the very beginning of this tutorial we had slightly less than 8 Gb of available space, it now has a value of roughly 6 Gb. The datasets are just a subdivision of the zpool, they compete with each others for using the available storage within the zpool, no miracle here. To what limit? The pool itself as we never imposed a quota on datasets. Hopefully df and zfs list gives a coherent result.

Second contact with attributes: quota management

Remember how painful is the quota management under Linux? Now you can say goodbye to setquota, edquota and other quotacheck commands, ZFS handle this in the snap of fingers! Guess with what? An ZFS dataset attribute of course! ;-) Just to make you drool here is how a 2Gb limit can be set on myfirstpool/mythirdDS :

# zfs set quota=2G myfirstpool/mythirdDS

Et voila! The zfs command is bit silent however if we check we can see that myfirstpool/mythirdDS is now capped to 2 Gb (forget about 'REFER' for the moment): around 1 Gb of data has been copied in this dataset thus leaving a big 1 Gb of available space.

# zfs list
NAME                     USED  AVAIL  REFER  MOUNTPOINT
myfirstpool             1.81G  6.00G   850M  /myfirstpool
myfirstpool/myfirstDS     30K  6.00G    30K  /myfirstpool/myfirstDS
myfirstpool/mysecondDS    30K  6.00G    30K  /myfirstpool/mysecondDS
myfirstpool/mythirdDS   1002M  1.02G  1002M  /myfirstpool/mythirdDS

Using the df command:

# df -h                                 
Filesystem              Size  Used Avail Use% Mounted on
(...)
myfirstpool             6.9G  850M  6.1G  13% /myfirstpool
myfirstpool/myfirstDS   6.1G     0  6.1G   0% /myfirstpool/myfirstDS
myfirstpool/mysecondDS  6.1G     0  6.1G   0% /myfirstpool/mysecondDS
myfirstpool/mythirdDS   2.0G 1002M  1.1G  49% /myfirstpool/mythirdDS

Of course you can use this technique for the home directories of your users /home this also having the a advantage of being much less forgiving than a soft/hard user quota: when the limit is reached, it is reached period and no more data can be written in the dataset. The user must do some cleanup and cannot procastinate anymore :-)

To remove the quota:

# zfs set quota=none myfirstpool/mythirdDS

none is simply the original value for the quota attribute (we did not demonstrate it, you can check by doing a zfs get quota myfirstpool/mysecondDS for example).

Destroying datasets

Important: There is no way to resurrect a destroyed ZFS dataset and the data it contained! Once you destroy a dataset the corresponding metadata is cleared and gone forever so be careful when using zfs destroy notably with the -r option ...


We have three datasets, but the third is pretty useless and contains a lot of garbage. Is it possible to remove it with a simple rm -rf? Let's try:

# rm -rf /myfirstpool/mythirdDS
rm: cannot remove `/myfirstpool/mythirdDS': Device or resource busy

This is perfectly normal, remember that datasets are indeed something mounted in your VFS. ZFS might be ZFS and do alot for you, it cannot enforce the nature of a mounted filesystem under Linux/Unix. The "ZFS way" to remove a dataset is to use the zfs command like this at the reserve no process owns open files on it (once again, ZFS can do miracles for you but not that kind of miracles as it has to unmount the dataset before deleting it):

# zfs destroy myfirstpool/mythirdDS
# zfs list
NAME                     USED  AVAIL  REFER  MOUNTPOINT
myfirstpool              444M  7.38G   444M  /myfirstpool
myfirstpool/myfirstDS     21K  7.38G    21K  /myfirstpool/myfirstDS
myfirstpool/mysecondDS    21K  7.38G    21K  /myfirstpool/mysecondDS

Et voila! No more myfirstpool/mythirdDS dataset. :-)

A bit more subtle case would be to try to destroy a ZFS dataset whenever another ZFS dataset is nested in it. Before doing that nasty experiment myfirstpool/mythirdDS must be created again this time with another nested dataset (myfirstpool/mythirdDS/nestedSD1):

# zfs create myfirstpool/mythirdDS
# zfs create myfirstpool/mythirdDS/nestedSD1
# zfs list
NAME                              USED  AVAIL  REFER  MOUNTPOINT
myfirstpool                       851M  6.98G   850M  /myfirstpool
myfirstpool/myfirstDS              30K  6.98G    30K  /myfirstpool/myfirstDS
myfirstpool/mysecondDS             30K  6.98G    30K  /myfirstpool/mysecondDS
myfirstpool/mythirdDS             124K  6.98G    34K  /myfirstpool/mythirdDS
myfirstpool/mythirdDS/nestedDS1    30K  6.98G    30K  /myfirstpool/mythirdDS/nestedDS1

Now let's try to destroy myfirstpool/mythirdDS again:

# zfs destroy myfirstpool/mythirdDS
cannot destroy 'myfirstpool/mythirdDS': filesystem has children
use '-r' to destroy the following datasets:
myfirstpool/mythirdDS/nestedDS1

The zfs command detected the situation and refused to proceed on the deletion without your consent to make a recursive destruction (-r parameter). Before going any step further let's create some more nested datasets plus a couple of directories inside myfirstpool/mythirdDS:

# zfs create myfirstpool/mythirdDS/nestedDS1
# zfs create myfirstpool/mythirdDS/nestedDS2
# zfs create myfirstpool/mythirdDS/nestedDS3
# zfs create myfirstpool/mythirdDS/nestedDS3/nestednestedDS
# mkdir /myfirstpool/mythirdDS/dir1
# mkdir /myfirstpool/mythirdDS/dir2
# mkdir /myfirstpool/mythirdDS/dir3
# zfs list
NAME                                             USED  AVAIL  REFER  MOUNTPOINT
myfirstpool                                      851M  6.98G   850M  /myfirstpool
myfirstpool/myfirstDS                             30K  6.98G    30K  /myfirstpool/myfirstDS
myfirstpool/mysecondDS                            30K  6.98G    30K  /myfirstpool/mysecondDS
myfirstpool/mythirdDS                            157K  6.98G    37K  /myfirstpool/mythirdDS
myfirstpool/mythirdDS/nestedDS1                   30K  6.98G    30K  /myfirstpool/mythirdDS/nestedDS1
myfirstpool/mythirdDS/nestedDS2                   30K  6.98G    30K  /myfirstpool/mythirdDS/nestedDS2
myfirstpool/mythirdDS/nestedDS3                   60K  6.98G    30K  /myfirstpool/mythirdDS/nestedDS3
myfirstpool/mythirdDS/nestedDS3/nestednestedDS    30K  6.98G    30K  /myfirstpool/mythirdDS/nestedDS3/nestednestedDS

Now what happens if myfirstpool/mythirdDS is destroyed again with '-r'?

# zfs destroy -r myfirstpool/mythirdDS
# zfs list                            
NAME                     USED  AVAIL  REFER  MOUNTPOINT
myfirstpool              851M  6.98G   850M  /myfirstpool
myfirstpool/myfirstDS     30K  6.98G    30K  /myfirstpool/myfirstDS
myfirstpool/mysecondDS    30K  6.98G    30K  /myfirstpool/mysecondDS

myfirstpool/mythirdDS and everything it contained is now gone!

Snapshotting and rolling back datasets

This is, by far, one of the coolest features of ZFS. You can:

  1. take a photo of a dataset (this photo is called a snapshot)
  2. do whatever you want with the data contained in the dataset
  3. restore (roll back) the dataset in in the exact same state it was before you did your changes just as if nothing had ever happened in the middle.

Single snapshot

Important: Only ZFS datasets can be snapshotted and rolled back, not the zpool.


To start with, let's copy some files in mysecondDS:

# cp -a /usr/portage /myfirstpool/mysecondDS
# ls /myfirstpool/mysecondDS/portage
total 672
drwxr-xr-x   48 root root   49 Aug 18  2013 app-accessibility
drwxr-xr-x  238 root root  239 Jan 10 06:22 app-admin
drwxr-xr-x    4 root root    5 Dec 28 08:54 app-antivirus
drwxr-xr-x  100 root root  101 Feb 26 07:19 app-arch
drwxr-xr-x   42 root root   43 Nov 26 21:24 app-backup
drwxr-xr-x   34 root root   35 Aug 18  2013 app-benchmarks
(...)
drwxr-xr-x   62 root root   63 Feb 20 06:47 x11-wm
drwxr-xr-x   16 root root   17 Aug 18  2013 xfce-base
drwxr-xr-x   64 root root   65 Dec 14 19:09 xfce-extra

Now, let's take a snapshot of mysecondDS. What command would be used? zpool or zfs? In that case it is zfs because we manipulate a ZFS dataset (this time you problably got it right!):

# zfs snapshot myfirstpool/mysecondDS@Charlie
Note: The syntax is always pool/dataset@snapshot, the snapshot's name is left at your discretion however you must use an arobase sign (@) to separate the snapshot's name from the rest of the path.

Let's check what /myfirstpool/mysecondDS contains after taking the snapshot:

# ls -la /myfirstpool/mysecondDS     
total 9
drwxr-xr-x   3 root root   3 Mar  2 18:22 .
drwxr-xr-x   5 root root   6 Mar  2 17:58 ..
drwx------ 170 root root 171 Mar  2 18:36 portage

Nothing really new the portage directory is here nothing more a priori. If you have used BTRFS before reading this tutorial you probably expected to see a @Charlie lying in /myfirstpool/mysecondDS? So where the check is Charlie? In ZFS a dataset snapshot is not visible from within the VFS tree (if you are not convinced you can search for it with the find command but it will never find it). Let's check with the zfs command:

# zfs list
# zfs list -t all     
NAME                             USED  AVAIL  REFER  MOUNTPOINT
myfirstpool                     1.81G  6.00G   850M  /myfirstpool
myfirstpool/myfirstDS             30K  6.00G    30K  /myfirstpool/myfirstDS
myfirstpool/mysecondDS          1001M  6.00G  1001M  /myfirstpool/mysecondDS

Wow... No sign of the snapshot. What you mus know is that indeed zfs list shows only datasets by default and omits snapshots. If the command is invoked with the parameter -t set to all it will list everything:

# zfs list
# zfs list -t all     
NAME                             USED  AVAIL  REFER  MOUNTPOINT
myfirstpool                     1.81G  6.00G   850M  /myfirstpool
myfirstpool/myfirstDS             30K  6.00G    30K  /myfirstpool/myfirstDS
myfirstpool/mysecondDS          1001M  6.00G  1001M  /myfirstpool/mysecondDS
myfirstpool/mysecondDS@Charlie      0      -  1001M  -

So yes, @Charlie is here! Also notice here the power of copy-on-write filesystems: @Charlie takes only a couple of kilobytes (some ZFS metadata) just like any ZFS snapshot at the time they are taken. The reason snapshots occupy very little space in the datasets is because data and metadata blocks are the same and no physical copy of them are made. At the time goes on and more and more changes happens in the original dataset (myfirstpool/mysecondDS here), ZFS will allocate new data and metadata blocks to accommodate the changes but will leave the blocks used by the snapshot untouched and the snapshot will tend to eat more and more pool space. It seems odd at first glance because a snapshot is a frozen in time copy of a ZFS dataset but this the way ZFS manage them. So caveat emptor: remove any unused snapshot to not full your zpool...

Now we have found Charlie, let's do some changes in the mysecondDS:

# rm -rf /myfirstpool/mysecondDS/portage/[a-h]*
# echo "Hello, world" >  /myfirstpool/mysecondDS/hello.txt
# cp /lib/firmware/radeon/* /myfirstpool/mysecondDS
# ls -l  /myfirstpool/mysecondDS
/myfirstpool/mysecondDS:
total 3043
-rw-r--r--  1 root root   8704 Mar  2 19:29 ARUBA_me.bin
-rw-r--r--  1 root root   8704 Mar  2 19:29 ARUBA_pfp.bin
-rw-r--r--  1 root root   6144 Mar  2 19:29 ARUBA_rlc.bin
-rw-r--r--  1 root root  24096 Mar  2 19:29 BARTS_mc.bin
-rw-r--r--  1 root root   5504 Mar  2 19:29 BARTS_me.bin
(...)
-rw-r--r--  1 root root  60388 Mar  2 19:29 VERDE_smc.bin
-rw-r--r--  1 root root     13 Mar  2 19:28 hello.txt
drwx------ 94 root root     95 Mar  2 19:28 portage

/myfirstpool/mysecondDS/portage:
total 324
drwxr-xr-x  16 root root   17 Oct 26 07:30 java-virtuals
drwxr-xr-x 303 root root  304 Jan 21 06:53 kde-base
drwxr-xr-x 117 root root  118 Feb 21 06:24 kde-misc
drwxr-xr-x   2 root root  756 Feb 23 08:44 licenses
drwxr-xr-x  20 root root   21 Jan  7 06:56 lxde-base
(...)

Now let's check again what the zpool command gives:

# zfs list -t all                      
NAME                             USED  AVAIL  REFER  MOUNTPOINT
myfirstpool                     1.82G  6.00G   850M  /myfirstpool
myfirstpool/myfirstDS             30K  6.00G    30K  /myfirstpool/myfirstDS
myfirstpool/mysecondDS          1005M  6.00G   903M  /myfirstpool/mysecondDS
myfirstpool/mysecondDS@Charlie   102M      -  1001M  -

Noticed the size's increase of myfirstpool/mysecondDS@Charlie? This is mainly due to new files copied in the snasphot: ZFS had to retained the original blocks of data. Now time to roll this ZFS dataset back to its original state (if some processes would have open files in the dataset to be rolled back, you should terminate them first) :

# zfs rollback myfirstpool/mysecondDS@Charlie
# ls -l /myfirstpool/mysecondDS
total 6
drwxr-xr-x 164 root root 169 Aug 18 18:25 portage

Again, ZFS handled everything for you and you now have the contents of mysecondDS exactly as it was at the time the snapshot Charlie was taken. Not more complicated than that. Not illustrated here but if you look at the output given by zfs list -t all at this point you will notice that the Charlie snapshot only eat very little space. This is normal: the modified blocks have been dropped so myfirstpool/mysecondDS and its myfirstpool/mysecondDS@Charlie snapshot are the same module some metadata (hence the few kilobytes of space taken).

the .zfs pseudo-directory or the secret passage to your snapshots

Any directory where a ZFS dataset is mounted (having snapshots or not) secretly contains a pseudo-directory named .zfs (dot-ZFS) and you will not see it even with the option -a given to a ls command unless you specify it. It is a contradiction to Unix and Unix-like systems' philosophy to not hide anything to the system administrator. It is not a bug of ZFS On Linux implementation and the Solaris implementation of ZFS exposes the exact behavior. So what is inside this little magic box?

# cd /myfirstpool/mysecondDS
# ls -la | grep .zfs        
# ls -lad .zfs              
dr-xr-xr-x 1 root root 0 Mar  2 15:26 .zfs
# cd .zfs
# pwd
/myfirstpool/mysecondDS/.zfs
# ls -la
total 4
dr-xr-xr-x 1 root root   0 Mar  2 15:26 .
drwxr-xr-x 3 root root 145 Mar  2 19:29 ..
dr-xr-xr-x 2 root root   2 Mar  2 19:47 shares
dr-xr-xr-x 2 root root   2 Mar  2 18:46 snapshot

We will focus on the snapshot directory and since we did not dropped the Charlie snapshot (yet) let's see what lies there:

# cd snapshot
# ls -l
total 0
dr-xr-xr-x 1 root root 0 Mar  2 20:16 Charlie

Yes we found Charlie here (also!), the snapshot is seen as regular directory but pay attention to its permissions:

  • owning user (root) has read+execute
  • owning group (root) has read+execute
  • rest of the world has read+execute

Did you notice? Not a single write permission on this directory, the only action any user can do is to enter in the directory and list its contents. This not a bug but the nature of ZFS snapshots: they are read-only stuff at the basis. Next question is naturally: can we change something in it? For that we have to enter inside the Charlie directory:

# cd Charlie
# ls -la
total 7
drwxr-xr-x   3 root root   3 Mar  2 18:22 .
dr-xr-xr-x   3 root root   3 Mar  2 18:46 ..
drwx------ 170 root root 171 Mar  2 18:36 portage

No surprise here: at the time we took the snapshot, myfirstpool/mysecondDS held a copy of the portage tree stored in a directory named portage. At first glance this one seems to be writable for the root user let's try to create a file in it:

# cd portage
# touch test
touch: cannot touch ‘test’: Read-only file system

Thing are a bit tricky here: indeed nothing has been mounted (check with the mount command!), we are walking though a pseudo-directory exposed by ZFS that holds the Charlie snapshot. Pseudo-directory because in fact .zfs had no physical existence even in the ZFS metadata as they exists in the zpool. It is just a convenient way provided by the ZFS kernel modules to walk inside the various snapshots' content. You can see but you cannot touch :-)

Backtracking changes between a dataset and its snapshot

Is it possible to know what is the difference between a a live dataset and its snapshot? Answer to this question is yes and the zfs command will help us in this task. Now we rolled back the myfirstpool/mysecondDS ZFS dataset back to its original state we have to botch it again:

# cp -a /lib/firmware/radeon/C* /myfirstpool/mysecondDS

Now inspect the difference between the live ZFS dataset myfirstpool/mysecondDS and its snasphot Charlie, this is done via zfs diff and by giving only the snapshot's name (you can inspect the difference between snasphot with that command with a slightly change in parameters):

# # zfs diff myfirstpool/mysecondDS@Charlie
M       /myfirstpool/mysecondDS/
+       /myfirstpool/mysecondDS/CAICOS_mc.bin
+       /myfirstpool/mysecondDS/CAICOS_me.bin
+       /myfirstpool/mysecondDS/CAICOS_pfp.bin
+       /myfirstpool/mysecondDS/CAICOS_smc.bin
+       /myfirstpool/mysecondDS/CAYMAN_mc.bin
+       /myfirstpool/mysecondDS/CAYMAN_me.bin
(...)

So do we have here? Two things: First it shows we have changed something in /myfirstpool/mysecondDS (notice the 'M' for Modified), second it shows the addition of several files (CAICOS_mc.bin, CAICOS_me.bin, CAICOS_pfp.bin...) by putting a plus sign ('+') on their left.

If we botch a bit more myfirstpool/mysecondDS by removing the file /myfirstpool/mysecondDS/portage/sys-libs/glibc/Manifest :

# rm /myfirstpool/mysecondDS/portage/sys-libs/glibc/Manifest
# zfs diff myfirstpool/mysecondDS@Charlie
M       /myfirstpool/mysecondDS/
M       /myfirstpool/mysecondDS/portage/sys-libs/glibc
-       /myfirstpool/mysecondDS/portage/sys-libs/glibc/Manifest
+       /myfirstpool/mysecondDS/CAICOS_mc.bin
+       /myfirstpool/mysecondDS/CAICOS_me.bin
+       /myfirstpool/mysecondDS/CAICOS_pfp.bin
+       /myfirstpool/mysecondDS/CAICOS_smc.bin
+       /myfirstpool/mysecondDS/CAYMAN_mc.bin
+       /myfirstpool/mysecondDS/CAYMAN_me.bin
(...)

Obviously deleted content is marked by a minus sign ('-').

Now a real butchery:

# rm -rf /myfirstpool/mysecondDS/portage/sys-devel/gcc 
# zfs diff myfirstpool/mysecondDS@Charlie
# zfs diff myfirstpool/mysecondDS@Charlie             
M       /myfirstpool/mysecondDS/
M       /myfirstpool/mysecondDS/portage/sys-devel
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/files
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/awk
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/awk/fixlafiles.awk
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/awk/fixlafiles.awk-no_gcc_la
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/c89
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/c99
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/gcc-4.6.4-fix-libgcc-s-path-with-vsrl.patch
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/gcc-spec-env.patch
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/gcc-spec-env-r1.patch
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/gcc-4.8.2-fix-cache-detection.patch
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/fix_libtool_files.sh
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/gcc-configure-texinfo.patch
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/gcc-4.8.1-bogus-error-with-int.patch
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.3.3-r2.ebuild
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/metadata.xml
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.6.4-r2.ebuild
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.6.4.ebuild
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.8.1-r1.ebuild
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.8.1-r2.ebuild
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.6.2-r1.ebuild
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.8.1-r3.ebuild
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.8.2.ebuild
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.8.1-r4.ebuild
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/Manifest
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.7.3-r1.ebuild
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/gcc-4.8.2-r1.ebuild
M       /myfirstpool/mysecondDS/portage/sys-libs/glibc
-       /myfirstpool/mysecondDS/portage/sys-libs/glibc/Manifest
+       /myfirstpool/mysecondDS/CAICOS_mc.bin
+       /myfirstpool/mysecondDS/CAICOS_me.bin
+       /myfirstpool/mysecondDS/CAICOS_pfp.bin
+       /myfirstpool/mysecondDS/CAICOS_smc.bin
+       /myfirstpool/mysecondDS/CAYMAN_mc.bin
+       /myfirstpool/mysecondDS/CAYMAN_me.bin
(...)

No need to explain that digital mayhem! What happens if, in addition, we change the contents of the file /myfirstpool/mysecondDS/portage/sys-devel/autoconf/Manifest?

# zfs diff myfirstpool/mysecondDS@Charlie
M       /myfirstpool/mysecondDS/
M       /myfirstpool/mysecondDS/portage/sys-devel
M       /myfirstpool/mysecondDS/portage/sys-devel/autoconf/Manifest
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/files
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/awk
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/awk/fixlafiles.awk
-       /myfirstpool/mysecondDS/portage/sys-devel/gcc/files/awk/fixlafiles.awk-no_gcc_la
(...)

ZFS shows that the file /myfirstpool/mysecondDS/portage/sys-devel/autoconf/Manifest has changed. So ZFS can help to track files deletion, creation and modifications. What it does not show is the difference of a file's content between as it exists in a live dataset and this dataset's snapshot. Not a big issue! You can explore a snapshot's content via the .zfs pseudo-directory and use a command like /usr/bin/diff to examine the difference with the file as it exists on the corresponding live dataset.

# diff -u /myfirstpool/mysecondDS/.zfs/snapshot/Charlie/portage/sys-devel/autoconf/Manifest /myfirstpool/mysecondDS/portage/sys-devel/autoconf/Manifest
--- /myfirstpool/mysecondDS/.zfs/snapshot/Charlie/portage/sys-devel/autoconf/Manifest   2013-08-18 08:52:01.742411902 -0400
+++ /myfirstpool/mysecondDS/portage/sys-devel/autoconf/Manifest 2014-03-02 21:36:50.582258990 -0500
@@ -4,7 +4,4 @@
 DIST autoconf-2.62.tar.gz 1518427 SHA256 83aa747e6443def0ebd1882509c53f5a2133f50...
 DIST autoconf-2.63.tar.gz 1562665 SHA256 b05a6cee81657dd2db86194a6232b895b8b2606a...
 DIST autoconf-2.64.tar.bz2 1313833 SHA256 872f4cadf12e7e7c8a2414e047fdff26b517c7...
-DIST autoconf-2.65.tar.bz2 1332522 SHA256 db11944057f3faf229ff5d6ce3fcd819f56545...
-DIST autoconf-2.67.tar.bz2 1369605 SHA256 00ded92074999d26a7137d15bd1d51b8a8ae23...
-DIST autoconf-2.68.tar.bz2 1381988 SHA256 c491fb273fd6d4ca925e26ceed3d177920233c...
 DIST autoconf-2.69.tar.xz 1214744 SHA256 64ebcec9f8ac5b2487125a86a7760d2591ac9e1d3...
(...)

Dropping a snapshot

A snapshot is no more than a dataset frozen in time and thus can be destroyed in the exact same way seen in the paragraphs before. Now we do not need the Charlie snapshot we can remove it. Simple:

# zfs destroy myfirstpool/mysecondDS@Charlie
# zfs list -t all
NAME                     USED  AVAIL  REFER  MOUNTPOINT
myfirstpool             1.71G  6.10G   850M  /myfirstpool
myfirstpool/myfirstDS     30K  6.10G    30K  /myfirstpool/myfirstDS
myfirstpool/mysecondDS   903M  6.10G   903M  /myfirstpool/mysecondDS

And Charlie is gone forever ;-)

The time travelling machine part 1 : examining differences between snapshots

So far we only used a single snapshot just to keep things simple. However a dataset can hold several snapshots and you can do everything seen so far with them like rolling back, destroying them or examining the difference not only between a snapshot and its corresponding live dataset but also between two snapshots. For this part we will consider the myfirstpool/myfirstDS dataset which should be empty at this point.

# ls -la /myfirstpool/myfirstDS
total 3
drwxr-xr-x 2 root root 2 Mar 2 21:14 .
drwxr-xr-x 5 root root 6 Mar 2 17:58 ..

Now let's generate some contents, take a snapshot (snapshot-1), add more content, take a snapshot again (snapshot-2), do some modifications again and take a third snapshot (snapshot-3):

# echo "Hello, world" > /myfirstpool/myfirstDS/hello.txt
# cp -R /lib/firmware/radeon /myfirstpool/myfirstDS
# ls -l /myfirstpool/myfirstDS
total 5
-rw-r--r-- 1 root root 13 Mar 3 06:41 hello.txt
drwxr-xr-x 2 root root 143 Mar 3 06:42 radeon
# zfs snapshot myfirstpool/myfirstDS@snapshot-1
# echo "Goodbye, world" > /myfirstpool/myfirstDS/goodbye.txt
# echo "Are you there?" >> /myfirstpool/myfirstDS/hello.txt
# cp /proc/config.gz /myfirstpool/myfirstDS
# rm /myfirstpool/myfirstDS/radeon/CAYMAN_me.bin
# zfs snapshot myfirstpool/myfirstDS@snapshot-2
# echo "Still there?" >> /myfirstpool/myfirstDS/goodbye.txt
# mv /myfirstpool/myfirstDS/hello.txt /myfirstpool/myfirstDS/hello_new.txt 
# cat /proc/version > /myfirstpool/myfirstDS/version.txt
# zfs snapshot myfirstpool/myfirstDS@snapshot-3
# zfs list -t all
NAME                               USED  AVAIL  REFER  MOUNTPOINT
myfirstpool                       1.81G  6.00G   850M  /myfirstpool
myfirstpool/myfirstDS             3.04M  6.00G  2.97M  /myfirstpool/myfirstDS
myfirstpool/myfirstDS@snapshot-1    47K      -  2.96M  -
myfirstpool/myfirstDS@snapshot-2    30K      -  2.97M  -
myfirstpool/myfirstDS@snapshot-3      0      -  2.97M  -
myfirstpool/mysecondDS            1003M  6.00G  1003M  /myfirstpool/mysecondDS

You saw to how use zfs diff to compare the difference between a snapshot and its corresponding "live" dataset in the above paragraphs. Doing the same exercise with two snapshots is not that much different as you just have to explicitly tell the command what datasets are to be compared against and the command will oputput the result in the exact same manner.So what are the differences between snapshots myfirstpool/myfirstDS@snapshot-1 and myfirstpool/myfirstDS@snapshot-2? Let's make the zfs command work for us:

# zfs diff myfirstpool/myfirstDS@snapshot-1 myfirstpool/myfirstDS@snapshot-2
M       /myfirstpool/myfirstDS/
M       /myfirstpool/myfirstDS/hello.txt
M       /myfirstpool/myfirstDS/radeon
-       /myfirstpool/myfirstDS/radeon/CAYMAN_me.bin
+       /myfirstpool/myfirstDS/goodbye.txt
+       /myfirstpool/myfirstDS/config.gz

Before digging farther, let's think about what we did between the time we created the first snapshot and the second snapshot:

  • We modified the file /myfirstpool/myfirstDS/hello.txt hence the 'M' shown on left of the second line (thus we changed something under /myfirstpool/myfirstDS hence a 'M' is also shown on the left of the first line)
  • We deleted the file /myfirstpool/myfirstDS/radeon/CAYMAN_me.bin hence the minus sign ('-') shown on the left of the fourth line (and the 'M' shown on left of the third line)
  • We added two files which were /myfirstpool/myfirstDS/goodbye.txt and /myfirstpool/myfirstDS/config.gz hence the plus sign ('+') shown on the left of the fifth and sixth lines (also this is a change happening in /myfirstpool/myfirstDS hence another reason to show a 'M' on the left of the first line)

Now same exercise this time with snapshots myfirstpool/myfirstDS@snapshot-2 and myfirstpool/myfirstDS@snapshot-3:

# zfs diff myfirstpool/myfirstDS@snapshot-2 myfirstpool/myfirstDS@snapshot-3
M       /myfirstpool/myfirstDS/
R       /myfirstpool/myfirstDS/hello.txt -> /myfirstpool/myfirstDS/hello_new.txt
M       /myfirstpool/myfirstDS/goodbye.txt
+       /myfirstpool/myfirstDS/version.txt

Try to interpret what you see except for the second line where a "R" (standing for "Rename") is shown. ZFS is smart enough to also show both the old the new names!

Why not push the limit and try a few fancy things. First things first: what happens if we tell to compare two snapshots but in a reverse order?

# zfs diff myfirstpool/myfirstDS@snapshot-3 myfirstpool/myfirstDS@snapshot-2
Unable to obtain diffs: 
   Not an earlier snapshot from the same fs

Is ZFS would be a bit more happy if we ask the difference between two snapshots this time with a gap in between (so snapshot 1 with snapshot 3):

# zfs diff myfirstpool/myfirstDS@snapshot-1 myfirstpool/myfirstDS@snapshot-3
M       /myfirstpool/myfirstDS/
R       /myfirstpool/myfirstDS/hello.txt -> /myfirstpool/myfirstDS/hello_new.txt
M       /myfirstpool/myfirstDS/radeon
-       /myfirstpool/myfirstDS/radeon/CAYMAN_me.bin
+       /myfirstpool/myfirstDS/goodbye.txt
+       /myfirstpool/myfirstDS/config.gz
+       /myfirstpool/myfirstDS/version.txt

Amazing! Here again, take a couple of minutes to think about all operations you did on the dataset between the time you took the first snapshot and the time you took the last snapshot: this summary is the exact reflect of all your previous operations.

Just to put a conclusion on this subject, let's see the differences between the myfirstpool/myfirstDS dataset and its various snapshots:

# zfs diff myfirstpool/myfirstDS@snapshot-1                                 
M       /myfirstpool/myfirstDS/
R       /myfirstpool/myfirstDS/hello.txt -> /myfirstpool/myfirstDS/hello_new.txt
M       /myfirstpool/myfirstDS/radeon
-       /myfirstpool/myfirstDS/radeon/CAYMAN_me.bin
+       /myfirstpool/myfirstDS/goodbye.txt
+       /myfirstpool/myfirstDS/config.gz
+       /myfirstpool/myfirstDS/version.txt
# zfs diff myfirstpool/myfirstDS@snapshot-2
M       /myfirstpool/myfirstDS/
R       /myfirstpool/myfirstDS/hello.txt -> /myfirstpool/myfirstDS/hello_new.txt
M       /myfirstpool/myfirstDS/goodbye.txt
+       /myfirstpool/myfirstDS/version.txt
#  zfs diff myfirstpool/myfirstDS@snapshot-3

Having nothing reported for the last zfs diff is normal as changed in the dataset since the snapshot has been taken.

The time travelling machine part 2: rolling back with multiple snapshots

Examining the differences between the various snapshots of a dataset or the dataset itself would be quite useless if we would not be able to roll the dataset back to one of its previous states. How we have salvaged myfirstpool/myfirstDS a bit, it would the time to restore it at it was when the first snapshot had been taken:

# zfs rollback myfirstpool/myfirstDS@snapshot-1
cannot rollback to 'myfirstpool/myfirstDS@snapshot-1': more recent snapshots exist
use '-r' to force deletion of the following snapshots:
myfirstpool/myfirstDS@snapshot-3
myfirstpool/myfirstDS@snapshot-2

Err... Well, ZFS just tells us that several more recent snapshots exists and it refuses to proceed without dropping those latter. Unfortunately for us there is no way to circumvent that: once you jump backward you have no way to move forward again. We could demonstrate the rollback to myfirstpool/myfirstDS@snapshot-3 then myfirstpool/myfirstDS@snapshot-2 then myfirstpool/myfirstDS@snapshot-1 but it would be of very little interest previous sections of this tutorial did that already so second attempt:

# zfs rollback -r myfirstpool/myfirstDS@snapshot-1
# zfs list -t all                                                           
NAME                               USED  AVAIL  REFER  MOUNTPOINT
myfirstpool                       1.81G  6.00G   850M  /myfirstpool
myfirstpool/myfirstDS             2.96M  6.00G  2.96M  /myfirstpool/myfirstDS
myfirstpool/myfirstDS@snapshot-1     1K      -  2.96M  -
myfirstpool/mysecondDS            1003M  6.00G  1003M  /myfirstpool/mysecondDS

myfirstpool/myfirstDS effectively returned to the desired state (notice the size of myfirstpool/myfirstDS@snapshot-1) and the snapshots snapshot-2 and snapshot-3 vanished. Just to convince you:

# zfs diff myfirstpool/myfirstDS@snapshot-1
#

No differences at all!

Snapshots and clones

Streaming ZFS datasets over the network

You find ZFS snaphots useful? Well, you have seen just a small part of their potential. As a snapshot is a photograph of what a dataset contains frozen in the time, snapshots can be seen as being no more than a data backup. Like any backup, they must not stay on the local machine but must be put elsewhere and the common good sense tells to keep backups in a safe place, making them travel through a secure channel. By "secure channel" we intend something like a trusted person in your organization whose job consists of bringing a box of tapes off-site in a secure location but we also intend a secure communication channel like an SSH tunnel over two hosts without any human intervention.

ZSH designers had the same vision and made possible for a dataset to be able to be sent over a network. How is that possible? Simple: the process involves two peers who can use through a communication channel like the one established by netcat (OpenSSH supports a similar functionality but with an encrypted communication channel). For the sake of the demonstration, we will use two Solaris boxes at each end-point.

How stream some ZFS bits over the network? Here again, zfs is the answer. A nifty move from the designers was to use stdin and stdout as transmission/reception channels thus allowing great a flexibility in processing the ZFS stream. You can envisage, for instance, to compress your stream then crypt it then encode it in base64 then sign it and so on. It sounds a bit overkill but it is possible and in the general case you can use any tool that swallow the data from stdin and spit it through stdout in your plumbing.

Note: The rest of this section has been done entirely on two Solaris 11 machines.

1. Sender side:

 
# zfs create testpool2/zfsstreamtest
# echo 'Hello, world!' > /testpool2/zfsstreamtest/hello.txt
# echo 'Goodbye, world' > /testpool2/zfsstreamtest/goodbye.txt
# zfs snapshot zfs testpool2/zfsstreamtest@s1
# zfs list -t snapshot
NAME                               USED  AVAIL  REFER  MOUNTPOINT
testpool2/zfsstreamtest@s1            0      -    32K           -

2. Receiver side (the dataset zfs-stream-test will be created and should not be present):

# nc -l -p 7000 | zfs receive testpool/zfs-stream-test

At this point the receiver is waiting after some data.

3. Sender side:

# zfs send testpool2/zfsstreamtest@s1 | nc 192.168.aaa.bbb.ccc 7000

4. Receiver side:

# zfs list -t snapshot
NAME                          USED  AVAIL  REFER 
...
testpool2/zfs-stream-test@s1       0      -  46.4K  -

Note that we did not set an explicit snapshot name in the second step but it could have been possible to choose anything else but the default which is the name of the snapshot sent over the network. In that case the dataset which will contain the snapshot needs to be created first:

# nc -l -p 7000 | zfs receive testpool/zfs-stream-test@mysnapshot01

Once received you would get:

# zfs list -t snapshot
NAME                                      USED  AVAIL  REFER 
...
testpool2/zfs-stream-test@mysnapshot01       0      -  46.4K  -

5. Just for the sake of the curiosity let's do a rollback on the receiver side:

# zfs rollback testpool2/zfsstreamtest@s1
# ls -l /testpool2/zfs-stream-test
total 2
-rw-r--r-- 1 root root 15 2011-09-06 23:54 goodbye.txt
-rw-r--r-- 1 root root 13 2011-09-06 23:53 hello.txt
# cat /testpool2/zfs-stream-test/hello.txt
Hello, world

Because ZFS streaming operates using the starnd input and output (stdin / stdout) you can build a bit more complex pipeline like:

# zfs send testpool2/zfsstreamtest@s1 | gzip | nc 192.168.aaa.bbb.ccc 7000

The above example was using two hosts but a simpler setup is also possible: you are not required to send you data over the network with netcat, you can store it to a regular file then mail it or store it on a USB key. By the way: we have not finished! We took only a simple case here: it is absolutely possible to do the exact same operation with the difference between snapshots (incremental). Just like an incremental backup takes only what has changed, ZFS can determine the difference between two snapshots and streaming instead of streaming a snapshot taken at whole. Although ZFS can detect and act on differentials, it does not operate (yet) at the block level: if only a few bytes of a very big file have changed, the whole file will be taken into consideration (operating at data block level is possible with some tools like the well-known rsync).

Consider the following:

  • A dataset snapshot (S1) contains two files:
    • A -> 10 MB
    • B -> 4 GB
  • A bit later some files (named C, D and E) are added to the dataset and another snapshot is (S2) taken. S2 contains:
    • A -> 10 MB
    • B -> 4 GB
    • C -> 3 MB
    • D -> 500 KB
    • E -> 1GB

With a full transfer of S2 A,B,C,D and E would be streamed whereas an incremental transfert (S2-S1), zfs would only process C, D and E. The next $100 question:"How can we stream a difference of snapshot? zfs again?" Yes! This time with a subtle difference: a special option specified on the command line telling it must use a difference rather than a full snapshot. Assuming a few more files are added in testpool2/zfsstreamtest dataset and a snapshot (s2) is has been taken, the delta between s2 and s1 (s2-s1) giving s3 can be send like this (on the receiver side the same as shown above is used, nothing special is required alos notice the presence of the -i option):

  • Sender:
# zfs send -i testpool2/zfsstreamtest@s1 testpool2/zfsstreamtest@s2 | nc 192.168.aaa.bbb.ccc 7000
  • Receiver:
# nc -l -p 7000 | zfs receive testpool/zfs-stream-test
# zfs list -t snapshot
testpool/zfs-stream-test@s1       28.4K      -  46.4K  -
testpool/zfs-stream-test@s2           0      -  47.1K  -

Note that although we did not specified any snapshot name to use on the receiver side, ZFS used by default the name of the second snapshot involved in the delta (s2 here).


$200 question: suppose we delete all of the received snapshots so far on the receiver side and we try to send the difference between s2 and s1, what would happen? ZFS will protest on the receiver side although no error message will be visible on the sender side:

cannot receive incremental stream: destination testpool/zfs-stream-test has been modified
since most recent snapshot

It is even worse if we remove the dataset used to receive the data:

cannot receive incremental stream: destination 'testpool/zfs-stream-test' does not exist
Important: ZFS streaming over a network has no underlying protocol, therefore the sender just assumes the data has been successfully received and processed. It does not care whether a processing error occurs.

Govern a dataset by attributes

So far, most of a filesystem capabilities were driven by separate and scarced command line line tools (e.g. tune2fs, edquota, rquota, quotacheck...) which all have their own ways to handle tasks and can go through tricky ways sometimes especially the quota-related management utilities. Moreover, there was no easy way to handle a limitations on a directory rather than putting it a a dedicated partition or logical volume implying downtimes when additional space was to be added. Quota management is however one of the many facets disk space management includes.

In the ZFS world, many aspects are now managed by simply setting/clearing a property attached to a ZFS dataset through the now so well-known command zfs.You can, for example:

  • put a size limit on a dataset
  • reserve a space for dataset (that space is guaranteed to be available in the future although not being allocated at the time the reservation is made)
  • control if new files are encrypted and/or compressed
  • define a quota per user or group of users
  • control checksum usage => never turn that property off unless having very good reasons you are likely to never have (no checksums = no silent data corruption detection)
  • share a dataset by NFS/CIFS
  • control automatic data deduplication

Not all of a dataset properties are settable, some of them are set and managed by the operating system in the background for you and thus cannot be modified.

Note: Solaris/OpenIndiana users: ZFS has a tight integration with the NFS/CIFS server, thus it is possible to share a zfs dataset by setting adequate attributes. ZFS on Linux (native kernel mode port) also has a tight integration with the built-in Linux NFS server, the same for ZFS fuse although still experimental. Under FreeBSD ZFS integration has been done both with NFS and Samba (CIFS).

Like any other action concerning datasets, properties are sets and unset via the zfs command. On our Funtoo box running zfs-Fuse we can, for example, start by seeing the value of all properties for the dataset myfirstpool/myfirstDS:

# zfs get all myfirstpool/myfirstDS
 zfs get all myfirstpool/myfirstDS 
NAME                   PROPERTY              VALUE                   SOURCE
myfirstpool/myfirstDS  type                  filesystem              -
myfirstpool/myfirstDS  creation              Sun Sep  4 23:34 2011   -
myfirstpool/myfirstDS  used                  73.8M                   -
myfirstpool/myfirstDS  available             5.47G                   -
myfirstpool/myfirstDS  referenced            73.8M                   -
myfirstpool/myfirstDS  compressratio         1.00x                   -
myfirstpool/myfirstDS  mounted               yes                     -
myfirstpool/myfirstDS  quota                 none                    default
myfirstpool/myfirstDS  reservation           none                    default
myfirstpool/myfirstDS  recordsize            128K                    default
myfirstpool/myfirstDS  mountpoint            /myfirstpool/myfirstDS  default
myfirstpool/myfirstDS  sharenfs              off                     default
myfirstpool/myfirstDS  checksum              on                      default
myfirstpool/myfirstDS  compression           off                     default
myfirstpool/myfirstDS  atime                 on                      default
myfirstpool/myfirstDS  devices               on                      default
myfirstpool/myfirstDS  exec                  on                      default
myfirstpool/myfirstDS  setuid                on                      default
myfirstpool/myfirstDS  readonly              off                     default
myfirstpool/myfirstDS  zoned                 off                     default
myfirstpool/myfirstDS  snapdir               hidden                  default
myfirstpool/myfirstDS  aclmode               groupmask               default
myfirstpool/myfirstDS  aclinherit            restricted              default
myfirstpool/myfirstDS  canmount              on                      default
myfirstpool/myfirstDS  xattr                 on                      default
myfirstpool/myfirstDS  copies                1                       default
myfirstpool/myfirstDS  version               4                       -
myfirstpool/myfirstDS  utf8only              off                     -
myfirstpool/myfirstDS  normalization         none                    -
myfirstpool/myfirstDS  casesensitivity       sensitive               -
myfirstpool/myfirstDS  vscan                 off                     default
myfirstpool/myfirstDS  nbmand                off                     default
myfirstpool/myfirstDS  sharesmb              off                     default
myfirstpool/myfirstDS  refquota              none                    default
myfirstpool/myfirstDS  refreservation        none                    default
myfirstpool/myfirstDS  primarycache          all                     default
myfirstpool/myfirstDS  secondarycache        all                     default
myfirstpool/myfirstDS  usedbysnapshots       18K                     -
myfirstpool/myfirstDS  usedbydataset         73.8M                   -
myfirstpool/myfirstDS  usedbychildren        0                       -
myfirstpool/myfirstDS  usedbyrefreservation  0                       -
myfirstpool/myfirstDS  logbias               latency                 default
myfirstpool/myfirstDS  dedup                 off                     default
myfirstpool/myfirstDS  mlslabel              off                     -

How can we set a limit that prevents myfirstpool/myfirstDS to not use more than 1 GB of space in the pool? Simple, just set the quota property:

# zfs set quota=1G myfirstpool/myfirstDS
# zfs get quota myfirstpool/myfirstDS
NAME                   PROPERTY  VALUE  SOURCE
myfirstpool/myfirstDS  quota     1G     local

May be something poked your curiosity: what "SOURCE" means? "SOURCE" describes how the property has been determined for the dataset and can have several values:

  • local: the property has been explicitly set for this dataset
  • default: a default value has been assigned by the operating system if not explicitely set by the system adminsitrator (e.g SUID allowed or not in the above example).
  • dash (-): not modifiable intrinsic property (e.g. dataset creation time, whether the dataset is currently mounted or not, dataset space usage in the pool, average compression ratio...)

Before copying some files in the dataset, let's fix a binary (on/off) property:

# zfs set compression=on myfirstpool/myfirstDS

Now try to put more than 1GB of data in the dataset:

# dd if=/dev/zero of=/myfirstpool/myfirstDS/one-GB-test bs=2G count=1
dd: writing `/myfirstpool/myfirstDS/one-GB-test': Disk quota exceeded

Permission delegation

ZFS brings a feature known as delegated administration. Delegated administration enables ordinary users to handle administrative tasks on a dataset without being administrators. It is however not a sudo replacement as it covers only ZFS related tasks such as sharing/unsharing, disk quota management and so on. Permission delegation shines in flexibility because such delegation can be handled by inheritance though nested datasets. Pewrmission deleguation is handled via zfs through its allow and disallow options.

Data redundancy with ZFS

Nothing is perfect and the storage medium (even in datacenter-class equipment) is prone to failures and fails on a regular basis. Having data redundancy is mandatory to help in preventing single-points of failure (SPoF). Over the past decades, RAID technologies were powerful however their power is precisely their weakness: as operating at the block level, they do not care about what is stored on the data blocks and have no ways to interact with the filesystems stored on them to ensure data integrity is properly handled.

Some statistics

It is not a secret to tell that a general trend in the IT industry is the exponential growth of data quantities. Just thinking about the amount of data Youtube, Google or Facebook generates every day taking the case of the first some statistics gives:

  • 24 hours of video is generated every minute in March 2010 (May 2009 - 20h / October 2008 - 15h / May 2008 - 13h)
  • More than 2 billions views a day
  • More video is produced on Youtube every 60 days than 3 major US broadcasting networks did in the last 60 years

Facebook is also impressive (Facebook own stats):

  • over 900 million objects that people interact with (pages, groups, events and community pages)
  • Average user creates 90 pieces of content each month (750 millions users active)
  • More than 2.5 million websites have integrated with Facebook

What is true with Facebook and Youtube is also true with many other cases (think one minutes about the amount of data stored in iTunes) especially with the growing popularity of cloud computing infrastructures. Despite the progress of the technology a "bottleneck" still exists: the storage reliability is nearly the same over the years. If only one organization in the world generate huge quantities of data it would be the CERN (Conseil Européen pour la Recherche Nucléaire, now officially known as European Organization for Nuclear Research) as their experiments can generate spikes of many terabytes of data within a few seconds. A study done in 2007 quoted by a ZDNet article reveals that:

  • Even ECC memory cannot be always be helpful: 3 double-bit errors (uncorrectable) occurred in 3 months on 1300 nodes. Bad news: it should be zero.
  • RAID systems cannot protect in all cases: monitoring 492 RAID controller for 4 weeks showed an average error rate of 1 per ~10^14 bits, giving roughly 300 errors for every 2.4 petabytes
  • Magnetic storage is still not reliable even on high-end datacenter class drives: 500 errors found over 100 nodes while writing 2 GB file to 3000+ nodes every 2 hours then read it again and again for 5 weeks.

Overall this means: 22 corrupted files (1 in every 1500 files) for a grand total of 33700 files holding 8.7TB of data. And this study is 5 years old....

Source of silent data corruption

http://www.zdnet.com/blog/storage/50-ways-to-lose-your-data/168

Not an exhaustive list but we can quote:

  • Cheap controller or buggy driver that does not reports errors/pre-failure conditions to the operating system;
  • "bit-leaking": an harddrive consists of many concentric magnetic tracks. When the hard drive magnetic head writes bits on the magnetic surface it generates a very weak magnetic field however sufficient to "leak" on the next track and change some bits. Drives can generally, compensate those situations because they also records some error correction data on the magnetic surface
  • magnetic surface defects (weak sectors)
  • Hard drives firmware bugs
  • Cosmic rays hitting your RAM chips or hard drives cache memory/electronics

Building a mirrored pool

ZFS RAID-Z

ZFS/RAID-Z vs RAID-5

RAID-5 is very commonly used nowadays because of its simplicity, efficiency and fault-tolerance. Although the technology did its proof over decades, it has a major drawback known as "The RAID-5 write hole". if you are familiar with RAID-5 you already know that is consists of spreading the stripes across all of the disks within the array and interleaving them with a special stripe called the parity. Several schemes of spreading stripes/parity between disks exists in the natures, each one with its own pros and cons, however the "standard" one (also known as left-asynchronous) is:

Disk_0  | Disk_1  | Disk_2  | Disk_3
[D0_S0] | [D0_S1] | [D0_S2] | [D0_P]
[D1_S0] | [D1_S1] | [D1_P]  | [D1_S2]
[D2_S0] | [D2_P]  | [D2_S1] | [D2_S2]
[D2_P]  | [D2_S0] | [D2_S1] | [D2_S2]

The parity is simply computed by XORing the stripes of the same "row", thus giving the general equation:

  • [Dn_S0] XOR [Dn_S1] XOR ... XOR [Dn_Sm] XOR [Dn_P] = 0

This equation can be rewritten in several ways:

  • [Dn_S0] XOR [Dn_S1] XOR ... XOR [Dn_Sm] = [Dn_P]
  • [Dn_S1] XOR [Dn_S2] XOR ... XOR [Dn_Sm] XOR [Dn_P] = [Dn_S0]
  • [Dn_S0] XOR [Dn_S2] XOR ... XOR [Dn_Sm] XOR [Dn_P] = [Dn_S1]
  • ...and so on!

Because the equations are a combinations of exclusive-or, it is possible to easily compute a parameter if it is missing. Let say we have 3 stripes plus one parity composed of 4 bits each but one of them is missing due to a disk failure:

  • D0_S0 = 1011
  • D0_S1 = 0010
  • D0_S2 = <missing>
  • D0_P = 0110

However we know that:

  • D0_S0 XOR D0_S1 XOR D0_S2 XOR D0_P = 0000 also rewritten as:
  • D0_S2 = D0_S1 XOR D0_S2 XOR D0_P

Applying boolean algebra it gives: D0_S2 = 1011 XOR 0010 XOR 0110 = 1111. Proof: 1011 XOR 0010 XOR 1111 = 0110 this is the same as D0_P

'So what's the deal?' Okay now the funny part, forgot the above hypothesis and imagine we have this:

  • D0_S0 = 1011
  • D0_S1 = 0010
  • D0_S2 = 1101
  • D0_P = 0110

Applying boolean algebra magics gives 1011 XOR 0010 XOR 1101 => 0100. Problem: this is different of D0_P (0110). Can you tell which one (or which ONES) of the four terms lies? If you find a mathematically acceptable solution, found your company because you have just solved a big computer science problem. If humans can't solve the question, imagine how hard it is for the poor little RAID-5 controller to determine which stripe is right and which one lies and the resulting "datageddon" (i.e. massive data corruption on the RAID-5 array) when the RAID-5 controller detect error and start to rebuild the array.

This is not science fiction, this a pure reality and the weakness stays in the RAID-5 simplicity. Here is how it can happen: an urban legend with RAID-5 arrays is that they update stripes in an atomic transaction (all of the stripes+parity are written or none of them). Too bad, this is just not true, the data is written on the fly and if for a reason or another the machine where the RAID-5 array has a power outage or crash, the RAID-5 controller will simply have no idea about what he was doing and which stripes are up to date which ones are not up to date. Of course, RAID controllers in servers do have a replaceable on-board battery and most of the time the server they reside in is connected to an auxiliary source like a battery-based UPS or a diesel/gas electricity generator. However, Murphy laws or unpredictable hazards can, sometimes, happens....

Another funny scenario: imagine a machine with a RAID-5 array (on UPS this time) but with non ECC memory. the RAID-5 controller splits the data buffer in stripes, computes a data stripe and starts to write them on the different disks of the array. But...but...but... For some odd reason, only one bit in one of the stripes flips (cosmic rays, RFI...) after the parity calculation. Too bad too sad, one of the written stripes contains corrupted data and it is silently written on the array. Datageddon in sight!

Not to make you freaking: storage units have sophisticated error correction capability (a magnetic surface or an optical recording surface is not perfect and reading/writing error occurs) masking most the cases. However, some established statistics estimates that even with error correction mechanism one bit over 10^16 bits transferred is incorrect. 10^16 is really huge but unfortunately in this beginning of the XXIst century with datacenters brewing massive amounts of data with several hundreds to not say thousands servers this this number starts to give headaches: a big datacenter can face to silent data corruption every 15 minutes (Wikepedia). No typo here, a potential disaster may silently appear 5 times an hour for every single day of the year. Detection techniques exists but traditional RAID-5 arrays in them selves can be a problem. Ironic for a so popular and widely used solution :)

If RAID-5 was an acceptable trade-off in the past decades, it simply made its time. RAID-5 is dead? *Horray!*

More advanced topics

ZFS Intention Log (ZIL)

Final words and lessons learned

ZFS surpasses by far (as of September 2011) every of the well-known filesystems around there: none of them propose such an integration of features and certainly not with this management simplicity and robustness. However in the Linux world it is definitely a no-go in the short term especially for production systems. The two known implementations are not ready for production environments and lacks some important features or behave in a clunky manner, this is absolutely correct as none of them pretend to be at this level of maturity and the licensing incompatibility between the code opened by Sun Microsystems some years ago and the GNU/GPL does not help the cause. However, both look very promising once their corners will become rounded.

For a Linux system, the nearest plan B is you seek for a BTRFS like filesystem covering some of the functionalities offered by ZFS is BTRFS (still considered as experimental, be prepared to a disaster sooner or later although BTRFS is used by some Funtoo core team members since 2 years and proved to be quite stable in practise). BTRFS however does not pushes the limits as much as ZFS does: it does not have built-in snapshot differentiation tool nor implement built-in filesystem streaming capabilities and roll-backing a BTRFS subvolume is a bit more manual than in "the ZFS way of life".


Footnotes & references

Source: solaris-zfs-administration-guide

<references/>