Difference between revisions of "User:Mrl5/Btrfs"

From Funtoo
Jump to navigation Jump to search
m (editorial changes)
((vol.4) snapshots + dont use rootfs as an example)
Line 9: Line 9:
<!--T:3-->
<!--T:3-->
Btrfs is intended to address the lack of pooling, snapshots, checksums, and integral multi-device spanning in Linux file systems.
Btrfs is intended to address the lack of pooling, snapshots, checksums, and integral multi-device spanning in Linux file systems.
<!--T:4-->
It is easy to set up and use btrfs. In this simple introduction, we're going to set up btrfs under Funtoo Linux using an existing {{c|debian-sources}} or {{c|debian-sources-lts}} kernel, like the one that comes pre-built for you with Funtoo Linux, and we will also be using our btrfs storage pool for storing data that isn't part of the Funtoo Linux installation itself. Funtoo Linux will boot from a non-btrfs filesystem, and as part of the initialization process will initialize our btrfs storage and mount it at the location of our choice.


== Installation == <!--T:5-->
== Installation == <!--T:5-->
Line 83: Line 86:
Now you can mount the created volume as you would mount any other linux filesystem.
Now you can mount the created volume as you would mount any other linux filesystem.


<!--T:21-->
{{console|body=
{{console|body=
# ##i## mkdir /mnt/btrfs-top-level
# ##i## mkdir /mnt/btrfs-top-level
Line 94: Line 96:
{{Important|It is recommended that nothing is stored directly on this top-level volume (ID 5) root directory.}}
{{Important|It is recommended that nothing is stored directly on this top-level volume (ID 5) root directory.}}


== Creating Subvolumes == <!--T:26-->
== Creating Subvolumes ==
 
From now on, we will describe how to set up btrfs as Funtoo Linux root filesystem. However the concepts are generic and can be re-used for other usecases.
 
{{Note|Changing subvolume layouts is made simpler by not using the top-level subvolume (ID 5) as {{c|/}}}}


Btrfs has a concept of subvolumes. Subvolume is an independently mountable POSIX filetree (but not a block device). There are several basic schemas to layout subvolumes (including snapshots) as well as mixtures thereof. The one described here was influenced by [https://documentation.suse.com/sles/15-SP1/html/SLES-all/cha-filesystems.html#sec-filesystems-major-btrfs-suse this SLES docs].
Btrfs has a concept of subvolumes. Subvolume is an independently mountable POSIX filetree (but not a block device). There are several basic schemas to layout subvolumes (including snapshots) as well as mixtures thereof.


Lets create children of the top level subvolume (ID 5). We will have:
Lets create children of the top level subvolume (ID 5). We will have:
* {{c|@funtoo}} - used for our Funtoo Linux root filesystem: {{c|/}}
* {{c|@data}} - it will serve as mountable {{c|/data}}
* {{c|@home}} - it will serve as {{c|/home}} that e.g. can be shared with other OS
* {{c|.snapshots}} - here snapshots will be stored
* {{c|snapshots}} - here backup snapshots will be stored


(later we will also create nested subvolumes)


{{console|body=
{{console|body=
# ##i## cd /mnt/btrfs-top-level
# ##i## cd /mnt/btrfs-top-level
# ##i## btrfs subvolume create @funtoo
# ##i## btrfs subvolume create @data
# ##i## btrfs subvolume create @home
# ##i## btrfs subvolume create snapshots
# ##i## btrfs subvolume create snapshots
# ##i## btrfs subvolume list /mnt/btrfs-top-level
# ##i## btrfs subvolume list /mnt/btrfs-top-level
ID 256 gen 322336 top level 5 path @funtoo
ID 256 gen 322338 top level 5 path @data
ID 257 gen 322338 top level 5 path @home
ID 257 gen 322275 top level 5 path .snapshots
ID 258 gen 322275 top level 5 path snapshots
}}
}}


{{Note|This layout allows creation of granular snapshots, so that e.g. your {{c|/home}} data don't get lost or overwritten during a roll back of {{c|/}}.}}
== The default Subvolume ==
 
Now lets reproduce steps from [[Install/Download and Extract Stage3]] and populate {{c|@funtoo}} subvolume with Funtoo Linux Stage3
{{console|body=
# ##i## cd @funtoo
# ##i## wget https://build.funtoo.org/next/x86-64bit/generic_64/stage3-latest.tar.xz
# ##i## tar --numeric-owner --xattrs --xattrs-include='*' -xpf stage3-latest.tar.xz
}}
 
== The default Subvolume == <!--T:27-->


{{Note|Changing the default subvolume with {{c|btrfs subvolume default}} will make the top level of the filesystem accessible only when {{c|subvol}} or {{c|subvolid}} mount options are specified}}
{{Note|Changing the default subvolume with {{c|btrfs subvolume default}} will make the top level of the filesystem accessible only when {{c|subvol}} or {{c|subvolid}} mount options are specified}}


When we mount a btrfs block device without specifying a subvolume the default one is used. In order to check which subvolume is currently the default one run
When btrfs block device is mounted without specifying a subvolume the default one is used. To check default subvolume run
{{console|body=
{{console|body=
# ##i## btrfs subvolume get-default /mnt/btrfs-top-level
# ##i## btrfs subvolume get-default /mnt/btrfs-top-level
Line 139: Line 124:
}}
}}


For the convenience lets make our {{c|@funtoo}} subvolume as the default one. It's good to double check the subvolume ID first. Let's use {{c|btrfs subvolume show}} command this time
For the convenience lets make {{c|@data}} subvolume as the default one. It's good to double check the subvolume ID first. Either {{c|btrfs subvolume list}} or {{c|btrfs subvolume show}} can be used for that
{{console|body=
{{console|body=
# ##i## btrfs subvolume show /mnt/btrfs-top-level/@funtoo
# ##i## btrfs subvolume show /mnt/btrfs-top-level/@data
...
...
Subvolume ID: 256
Subvolume ID: 256
}}
}}


Now we can make this subvolume as a default one
Now you can make this subvolume as a default one
{{console|body=
{{console|body=
# ##i## btrfs subvolume set-default 256 /mnt/btrfs-top-level
# ##i## btrfs subvolume set-default 256 /mnt/btrfs-top-level
# ##i## btrfs subvolume get-default /mnt/btrfs-top-level
# ##i## btrfs subvolume get-default /mnt/btrfs-top-level
 
ID 256 gen 322336 top level 5 path @data
ID 256 gen 322336 top level 5 path @funtoo
}}
}}


At this point we can stop working on the top level subvolume (ID 5) and instead mount directly our {{c|@funtoo}} subvolume.
At this point you can stop working on the top level subvolume (ID 5) and instead mount directly {{c|@data}} subvolume.


{{console|body=
{{console|body=
# ##i## cd /mnt
# ##i## cd /mnt
# ##i## umount /mnt/btrfs-top-level
# ##i## umount /mnt/btrfs-top-level
# ##i## mkdir /mnt/funtoo
# ##i## mkdir /data
# ##i## mount /dev/sdxy /mnt/funtoo
# ##i## mount /dev/sdxy /data
# ##i## ls -la /mnt/funtoo
# ##i## cd /mnt/funtoo
}}
}}


== Nested Subvolumes == <!--T:28-->
== Nested Subvolumes ==


{{Note|Nested subvolumes are not going to be a part of snapshots created from their parent subvolume. So one typical reason is to exclude certain parts of the filesystem from being snapshot.}}
{{Note|Nested subvolumes are not going to be a part of snapshots created from their parent subvolume. So one typical reason is to exclude certain parts of the filesystem from being snapshot.}}


Let's create a separate nested subvolumes for {{c|/tmp}}, {{c|/var/tmp}}, {{c|/var/cache}}, {{c|/var/log}}, {{c|/var/git}} and disable Copy-On-Write there.
Lets create a separate nested subvolume for {{c|/data/independent}}.
 
{{console|body=
# ##i## btrfs subvolume create /data/independent
# ##i## btrfs subvolume list /data
ID 258 gen 161 top level 256 path independent
}}
 
Usually you will want to "split" areas which are "complete" and/or "consistent" in themselves. Examples for this more-fine grained partitioning could be {{c|/var/log}}, {{c|/var/www}} or {{c|/var/lib/postgresql}}.
 
== /etc/fstab ==
 
To automatically mount the {{c|@data}} subvolume after reboot you need to modify {{c|/etc/fstab}}


{{Warning|The {{c|/var}} directory as a whole must not be a separate subvolume. The reason is that portage uses {{c|/var/db/pkg}} to manage currently installed packages. Otherwise the consistency wouldn't be preserved when restoring snapshot of {{c|/}}.}}
{{file|name=/etc/fstab|desc=fstab for btrfs|body=
/dev/sdxy /data btrfs subvolid=256,defaults 0 0
}}


Now lets verify if this changes were correct
{{console|body=
{{console|body=
# ##i## cd /mnt/funtoo
# ##i## cd /
# ##i## dirs="var/cache var/log var/git var/tmp tmp"
# ##i## umount /data
# ##i## for dir in $dirs; do
# ##i## mount /data
  [ -d "$dir" ] && mv -v "$dir" "${dir}.old"
# ##i## ls /data
  btrfs subvolume create "$dir"
independent
  chattr +C "$dir"
  [ -d "${dir}.old" ] && cp -a --reflink=never "${dir}.old/." "$dir" && rm -r "${dir}.old"
done
# ##i## chown portage:portage var/git
# ##i## chmod 1777 tmp
}}
}}


If you don't have {{c|/boot}} under separate block device (e.g. {{c|/dev/sda1}}) then you should also create a separate nested subvolume for it. You might also want to "split" other areas which are "complete" and/or "consistent" in themselves. Examples would be {{c|/var/www}}, or {{c|/var/lib/postgresql}}, which are usually more or less "independent" of the other other parts of system.
Did you just noticed that although we mounted our {{c|@data}} subvolume the nested subvolume {{c|@data/independent}} is also present?


{{Note|Nested subvolumes become automatically present if their parent subvolume is mounted.}}
{{Warning|According to [https://btrfs.readthedocs.io/en/latest/Administration.html#mount-options btrfs docs] most mount options apply to the whole filesystem and only options in the first mounted subvolume will take effect. This means that (for example) you can't set per-subvolume {{c|nodatacow}}, {{c|nodatasum}}, or {{c|compress}}.}}


== /etc/fstab == <!--T:22-->
== Snapshots ==


To automatically mount our new {{c|@funtoo}} and {{c|@home}} volumes after reboot we need to modify {{c|/etc/fstab}}
For the purpose of checking out this cool btrfs feature lets populate our filesystem with some example data first
{{console|body=
# ##i## echo 'btrfs' > /data/foo.txt
# ##i## echo 'fun' > /data/independent/bar.txt
}}


As you probably remember on the top level (next to {{c|@data}} subvolume) you've also created the {{c|.snapshots}} subvolume. You can mount it now to create some snapshots
{{console|body=
{{console|body=
/dev/sdxy / btrfs subvolid=256,defaults 0 0
# ##i## mkdir /mnt/snapshots
/dev/sdxy /home btrfs subvolid=257,defaults 0 0
# ##i## mount /dev/sdxy /mnt/snapshots
}}
}}


{{Note|According to [https://btrfs.readthedocs.io/en/latest/Administration.html#mount-options btrfs docs], within a single file system, it is not possible to mount some subvolumes with {{c|nodatacow}} and others with {{c|datacow}}. The mount option of the first mounted subvolume applies to any other subvolumes.}}
A snapshot is a subvolume like any other, with given initial content. By default, snapshots are created read-write. File modifications in a snapshot do not affect the files in the original subvolume. Lets create a read-write snapshot for {{c|/data}} and read-only snapshot for {{c|/data/independent}}
{{console|body=
# ##i## btrfs subvolume snapshot /data /mnt/snapshots/data_$(date -u -Iseconds)
Create a snapshot of '/data' in '/mnt/snapshots/data_2022-08-30T22:04:57+00:00'
# ##i## btrfs subvolume snapshot -r /data/independent /mnt/snapshots/independent_$(date -u -Iseconds)
Create a readonly snapshot of '/data/independent' in '/mnt/snapshots/independent_2022-08-30T22:05:29+00:00'
}}
 
Once again, nested subvolumes are not going to be a part of snapshots created from their parent subvolume. So you shouldn't be surprised when you compare the contents of the {{c|/data}} vs the contents of the {{c|/mnt/snapshots}}
{{console|body=
# ##i## tree /data
/data
├── foo.txt
└── independent
    └── bar.txt
# ##i## tree /mnt/snapshots
/mnt/snapshots
├── data_2022-08-30T22:04:57+00:00
│   └── foo.txt
└── independent_2022-08-30T22:05:29+00:00
    └── bar.txt
}}


== Snapshots == <!--T:29-->
At this point you might be interested in [https://btrfs.readthedocs.io/en/latest/Send-receive.html send and receive btrfs features].


'''todo'''
{{Note|According to [https://btrfs.readthedocs.io/en/latest/Subvolumes.html btrfs docs] a snapshot is not a backup: snapshots work by use of BTRFS copy-on-write behaviour. A snapshot and the original it was taken from initially share all of the same data blocks. If that data is damaged in some way (cosmic rays, bad disk sector, accident with dd to the disk), then the snapshot and the original will both be damaged.}}


== Wrap up == <!--T:29-->
== Wrap up ==


{{Important|It is recommended to run {{c|btrfs scrub}} once in a while. E.g. every week}}
{{Important|It is recommended to run {{c|btrfs scrub}} once in a while. E.g. every month}}


Scrub is the online check and repair functionality that verifies the integrity of data and metadata, assuming the tree structure is fine. You can run it on a mounted file system; it runs as a background process during normal operation.
Scrub is the online check and repair functionality that verifies the integrity of data and metadata, assuming the tree structure is fine. You can run it on a mounted file system; it runs as a background process during normal operation.


To start a (background) scrub on the filesystem which contains {{c|/}}
To start a (background) scrub on the filesystem which contains {{c|/data}} run
{{console|body=
{{console|body=
# ##i## btrfs scrub start /
# ##i## btrfs scrub start /data
scrub started on /, fsid 40f8b94f-07ee-4f7e-beb1-8e686abc246d (pid=5525)
scrub started on /data, fsid 40f8b94f-07ee-4f7e-beb1-8e686abc246d (pid=5525)
}}
}}


To check the status of a running scrub
To check the status of a running scrub
{{console|body=
{{console|body=
# ##i## btrfs scrub status /
# ##i## btrfs scrub status /data
UUID:            40f8b94f-07ee-4f7e-beb1-8e686abc246d
UUID:            40f8b94f-07ee-4f7e-beb1-8e686abc246d
Scrub started:    Tue Aug 30 00:38:54 2022
Scrub started:    Tue Aug 30 00:38:54 2022

Revision as of 22:31, August 30, 2022

Btrfs is a file system based on the copy-on-write (COW) principle, initially designed at Oracle Corporation for use in Linux. The development of btrfs began in 2007, and since August 2014 the file system's on-disk format has been marked as stable.

The Funtoo Linux project recommends btrfs as a next-generation filesystem, particularly for use in production.

Btrfs is intended to address the lack of pooling, snapshots, checksums, and integral multi-device spanning in Linux file systems.

It is easy to set up and use btrfs. In this simple introduction, we're going to set up btrfs under Funtoo Linux using an existing debian-sources or debian-sources-lts kernel, like the one that comes pre-built for you with Funtoo Linux, and we will also be using our btrfs storage pool for storing data that isn't part of the Funtoo Linux installation itself. Funtoo Linux will boot from a non-btrfs filesystem, and as part of the initialization process will initialize our btrfs storage and mount it at the location of our choice.

Installation

Enabling btrfs support is as simple as enabling the btrfs mix-in and running a world update:

root # epro mix-in +btrfs
root # emerge -uDN @world

Btrfs is now ready for use.

Btrfs Concepts

Btrfs can be used to manage the physical disks that it uses, and physical disks are added to a Btrfs volume. Then, BTRFS can create subvolumes from the volume on which files can be stored.

Unlike traditional Linux filesystems, btrfs filesystems will allocate storage on-demand from the underlying volume.

In the btrfs world, the word volume corresponds to a storage pool (ZFS) or a volume group (LVM).

  • devices - one or multiple underlying physical volumes.
  • volume - one large storage pool comprised of all space of the devices and can support different redundancy levels
  • subvolumes - these are what get mounted and you store files in.
  • snapshots - a read-only copy of a subvolume at a given point in time and/or read-write copy of a subvolume in time (aka clone).

Creating a Volume

To create a basic btrfs volume, you will need an extra empty disk. Perform the following steps:

root #  mkfs.btrfs /dev/sdxy
btrfs-progs v4.17.1 
See http://btrfs.wiki.kernel.org for more information.

Detected a SSD, turning off metadata duplication.  Mkfs with -m dup if you want to force metadata duplication.
Performing full device TRIM /dev/sdj (223.57GiB) ...
Label:              (null)
UUID:               d6bcba6e-8fd5-41fc-9bb4-79628c5c928c
Node size:          16384
Sector size:        4096
Filesystem size:    223.57GiB
Block group profiles:
  Data:             single            8.00MiB
  Metadata:         single            8.00MiB
  System:           single            4.00MiB
SSD detected:       yes
Incompat features:  extref, skinny-metadata
Number of devices:  1
Devices:
   ID        SIZE  PATH
    1   223.57GiB  /dev/sdxy

/dev/sdxy should be an unused disk. You may need to use the following command if this disk contains any pre-existing data on it:

root #  mkfs.btrfs -f /dev/sdxy

Now you can mount the created volume as you would mount any other linux filesystem.

root #  mkdir /mnt/btrfs-top-level
root #  mount /dev/sdxy /mnt/btrfs-top-level
root #  mount
...
/dev/sdxy on /mnt/btrfs-top-level type btrfs (rw,relatime,ssd,space_cache,subvolid=5,subvol=/)
   Important

It is recommended that nothing is stored directly on this top-level volume (ID 5) root directory.

Creating Subvolumes

Btrfs has a concept of subvolumes. Subvolume is an independently mountable POSIX filetree (but not a block device). There are several basic schemas to layout subvolumes (including snapshots) as well as mixtures thereof.

Lets create children of the top level subvolume (ID 5). We will have:

  • @data - it will serve as mountable /data
  • .snapshots - here snapshots will be stored


root #  cd /mnt/btrfs-top-level
root #  btrfs subvolume create @data
root #  btrfs subvolume create snapshots
root #  btrfs subvolume list /mnt/btrfs-top-level
ID 256 gen 322338 top level 5 path @data
ID 257 gen 322275 top level 5 path .snapshots

The default Subvolume

   Note

Changing the default subvolume with btrfs subvolume default will make the top level of the filesystem accessible only when subvol or subvolid mount options are specified

When btrfs block device is mounted without specifying a subvolume the default one is used. To check default subvolume run

root #  btrfs subvolume get-default /mnt/btrfs-top-level
ID 5 (FS_TREE)

For the convenience lets make @data subvolume as the default one. It's good to double check the subvolume ID first. Either btrfs subvolume list or btrfs subvolume show can be used for that

root #  btrfs subvolume show /mnt/btrfs-top-level/@data
...
	Subvolume ID: 		256

Now you can make this subvolume as a default one

root #  btrfs subvolume set-default 256 /mnt/btrfs-top-level
root #  btrfs subvolume get-default /mnt/btrfs-top-level
ID 256 gen 322336 top level 5 path @data

At this point you can stop working on the top level subvolume (ID 5) and instead mount directly @data subvolume.

root #  cd /mnt
root #  umount /mnt/btrfs-top-level
root #  mkdir /data
root #  mount /dev/sdxy /data

Nested Subvolumes

   Note

Nested subvolumes are not going to be a part of snapshots created from their parent subvolume. So one typical reason is to exclude certain parts of the filesystem from being snapshot.

Lets create a separate nested subvolume for /data/independent.

root #  btrfs subvolume create /data/independent
root #  btrfs subvolume list /data
ID 258 gen 161 top level 256 path independent

Usually you will want to "split" areas which are "complete" and/or "consistent" in themselves. Examples for this more-fine grained partitioning could be /var/log, /var/www or /var/lib/postgresql.

/etc/fstab

To automatically mount the @data subvolume after reboot you need to modify /etc/fstab

   /etc/fstab - fstab for btrfs
/dev/sdxy	/data	btrfs	subvolid=256,defaults	0 0

Now lets verify if this changes were correct

root #  cd /
root #  umount /data
root #  mount /data
root #  ls /data
independent

Did you just noticed that although we mounted our @data subvolume the nested subvolume @data/independent is also present?

   Warning

According to btrfs docs most mount options apply to the whole filesystem and only options in the first mounted subvolume will take effect. This means that (for example) you can't set per-subvolume nodatacow, nodatasum, or compress.

Snapshots

For the purpose of checking out this cool btrfs feature lets populate our filesystem with some example data first

root #  echo 'btrfs' > /data/foo.txt
root #  echo 'fun' > /data/independent/bar.txt

As you probably remember on the top level (next to @data subvolume) you've also created the .snapshots subvolume. You can mount it now to create some snapshots

root #  mkdir /mnt/snapshots
root #  mount /dev/sdxy /mnt/snapshots

A snapshot is a subvolume like any other, with given initial content. By default, snapshots are created read-write. File modifications in a snapshot do not affect the files in the original subvolume. Lets create a read-write snapshot for /data and read-only snapshot for /data/independent

root #  btrfs subvolume snapshot /data /mnt/snapshots/data_$(date -u -Iseconds)
Create a snapshot of '/data' in '/mnt/snapshots/data_2022-08-30T22:04:57+00:00'
root #  btrfs subvolume snapshot -r /data/independent /mnt/snapshots/independent_$(date -u -Iseconds)
Create a readonly snapshot of '/data/independent' in '/mnt/snapshots/independent_2022-08-30T22:05:29+00:00'

Once again, nested subvolumes are not going to be a part of snapshots created from their parent subvolume. So you shouldn't be surprised when you compare the contents of the /data vs the contents of the /mnt/snapshots

root #  tree /data
/data
├── foo.txt
└── independent
    └── bar.txt
root #  tree /mnt/snapshots
/mnt/snapshots
├── data_2022-08-30T22:04:57+00:00
│   └── foo.txt
└── independent_2022-08-30T22:05:29+00:00
    └── bar.txt

At this point you might be interested in send and receive btrfs features.

   Note

According to btrfs docs a snapshot is not a backup: snapshots work by use of BTRFS copy-on-write behaviour. A snapshot and the original it was taken from initially share all of the same data blocks. If that data is damaged in some way (cosmic rays, bad disk sector, accident with dd to the disk), then the snapshot and the original will both be damaged.

Wrap up

   Important

It is recommended to run btrfs scrub once in a while. E.g. every month

Scrub is the online check and repair functionality that verifies the integrity of data and metadata, assuming the tree structure is fine. You can run it on a mounted file system; it runs as a background process during normal operation.

To start a (background) scrub on the filesystem which contains /data run

root #  btrfs scrub start /data
scrub started on /data, fsid 40f8b94f-07ee-4f7e-beb1-8e686abc246d (pid=5525)

To check the status of a running scrub

root #  btrfs scrub status /data
UUID:             40f8b94f-07ee-4f7e-beb1-8e686abc246d
Scrub started:    Tue Aug 30 00:38:54 2022
Status:           running
Duration:         0:00:15
Time left:        0:00:34
ETA:              Tue Aug 30 00:39:44 2022
Total to scrub:   149.06GiB
Bytes scrubbed:   44.79GiB  (30.04%)
Rate:             2.99GiB/s
Error summary:    no errors found

You should now be at the point where you can begin to use btrfs for a variety of tasks. While there is a lot more to btrfs than what is covered in this short introduction, you should now have a good understanding of the fundamental concepts on which btrfs is based.