Difference between pages "Talk:User and Group Management" and "LVM Fun"

From Funtoo
(Difference between pages)
Jump to navigation Jump to search
 
 
Line 1: Line 1:
{{fancynote|the Discussion page contains a much more ambitious proposal that I decided was too complex to tackle all at once. It is being split into bite-sized portions (Phases) on the main page.}}
= Introduction =


== User and Group Dependencies ==
LVM (Logical Volume Management) offers a great flexibility in managing your storage and significantly reduces server downtimes by allowing on-line disk space management: The great idea beneath LVM is to '''make the data and its storage loosely coupled''' through several layers of abstraction. You (the system administrator) have the hand of each of those layers making the entire space management process extremely simple and flexible though  various set of coherent commands.


[http://www.exherbo.org/docs/exheres-for-smarties.html#repository_metadata Exheres] defines a dependency-based mechanism for ebuilds to specify their user and group dependencies, which is an appropriate mechanism for specifying dependencies. The specific syntax used is <tt>user/foo</tt> to specify a dependency on user <tt>foo</tt>, and <tt>group/bar</tt> to specify a dependency on group <tt>bar</tt> existing. Dependencies can be build-time or run-time, as required.
Several other well-known binary Linux distributions makes an aggressive use of LVM and several Unixes including HP-UX, AIX and Solaris offers since a while a similar functionality modulo the commands to be used. LVM is not mandatory but its usage can bring you additional flexibility and make your everyday life much more simpler.


Using dependencies for this purpose allows Portage to create these users and groups at exactly the right time -- prior to build or prior to install, as necessary, and will work just fine with binary packages, with some potential caveats (noted later in the document.) It also allows user and group creation to be affected by <tt>USE</tt> variable settings.
= Concepts =


This document suggests following the Exheres syntax:
As usual, having a good idea of the concepts lying beneath is mandatory. LVM is not very complicated, but it is easy to become confused, especially because it is a multi-layered system; however LVM designers had the good idea of keeping the command names consistent between all LVM command sets, making your life easier.
 
LVM consists of, mainly, three things:
 
* '''Physical volumes (or ''PV'')''': nothing more than a physical storage space. A physical volume can by anything like a partition on a local hard disk, a partition located on a remote SAN disk, a USB key or whatever else that could offer a storage space (so yes, technically it could be possible to use an optical storage device accessed in packet writing mode). The storage space on a physical volumes is divided (and managed) in small units called '''Physical Extents''' (or ''PE''). Just to give an analogy if you are a bit familiar with RAID, PE are a bit like RAID stripes.
* '''Volume Groups (or ''VG'')''': a group of at least one PV. VG are '''named''' entities and will appear in the system via the device mapper as '''/dev/''volume-group-name'''''.
* '''Logical Volumes (or ''LV'')''': a '''named''' division of a volume group in which a filesystem is created and that can be mounted in the VFS. Just for the record, just as for the PE in PV, a LV is managed as chucks known as Logical Extents (or ''LE''). Most of the time those LE are hidden to the system administrator due to a 1:1 mapping between them and the PE lying be just beneath but a cool fact to know about LEs is that they can be spread over PV just like RAID stripes in a RAID-0 volume. However, researches done on the Web tends to demonstrate system administrators prefer to build RAID volumes with mdadm than use LVM over them for performance reasons.
 
In short words:  LVM logical volumes (LV) are containers that can hold a single filesystem and which are created inside a volume group (VG) itself composed by an aggregation of at least one physical volumes (PV) themselves stored on various media (usb key, harddisk partition and so on). The data is stored in chunks spread over the various PV.
 
{{fancynote|Retain what PV, VG and LV means as we will use those abbreviations in the rest of this article.}}
 
= Your first tour of LVM =
 
== Physical volumes creation ==
 
{{fancynote|We give the same size to all volumes for the sake of the demonstration. This is not mandatory and be possible to have mixed sizes PV inside a same VG. }}
 
To start with, just create three raw disk images:
 
<pre>
# dd if=/dev/zero of=/tmp/hdd1.img bs=2G count=1
# dd if=/dev/zero of=/tmp/hdd2.img bs=2G count=1
# dd if=/dev/zero of=/tmp/hdd3.img bs=2G count=1
</pre>
 
and associate them to a loopback device:
 
<pre>
# losetup -f
/dev/loop0
# losetup /dev/loop0 /tmp/hdd1.img
# losetup /dev/loop1 /tmp/hdd2.img
# losetup /dev/loop2 /tmp/hdd3.img
 
</pre>
 
Okay nothing really exciting there, but wait the fun is coming! First check that '''sys-fs/lvm2''' is present on your system and emerge it if not. At this point, we must tell you a secret: although several articles and authors uses the taxonomy "LVM" it denotes "LVM version 2" or "LVM 2" nowadays. You must know that LVM had, in the old good times (RHEL 3.x and earlier), a previous revision known as "LVM version 1". LVM 1 is now considered as an extincted specie and is not compatible with LVM 2, although LVM 2 tools maintain a backward compatibility. 
 
The very frst step in LVM is to create the physical devices or ''PV''. "Wait create ''what''?! Aren't the loopback devices present on the system?" Yes they are present but they are empty, we must initialize them some metadata to make them usable by LVM. This is simply done by:
 
<pre>
# pvcreate /dev/loop0
  Physical volume "/dev/loop0" successfully created
# pvcreate /dev/loop1
  Physical volume "/dev/loop1" successfully created
# pvcreate /dev/loop2
  Physical volume "/dev/loop2" successfully created
</pre>
 
It is absolutely normal that nothing in particular is printed at the output of each command but we assure you: you have three LVM PVs. You can check them by issuing:
 
<pre>
# pvs
  PV        VG  Fmt  Attr PSize PFree
  /dev/loop0      lvm2 a-  2.00g 2.00g
  /dev/loop1      lvm2 a-  2.00g 2.00g
  /dev/loop2      lvm2 a-  2.00g 2.00g
</pre>
 
 
Some good information there:
* PV: indicates the physical path the PV lies on
* VG indicates the VG the PV belongs to. At this time, we didn't created any VG yet and the column remains empty.
* Fmt: indicates the format of the PV (here it says we have a LVM version 2 PV)
* Attrs: indicates some status information, the 'a' here just says that the PV is accessible.
* PSize and PFree: indicates the PV size and the amount of remaining space for this PV. Here we have three empty PV so it bascially says "2 gigabytes large, 2 out of gigabytes free"
 
It is now time to introduce you to another command: '''pvdisplay'''. Just run it without any arguments:
 
<pre>
pvdisplay
  "/dev/loop0" is a new physical volume of "2.00 GiB"
  --- NEW Physical volume ---
  PV Name              /dev/loop0
  VG Name             
  PV Size              2.00 GiB
  Allocatable          NO
  PE Size              0 
  Total PE              0
  Free PE              0
  Allocated PE          0
  PV UUID              b9i1Hi-llka-egCF-2vU2-f7tp-wBqh-qV4qEk
 
  "/dev/loop1" is a new physical volume of "2.00 GiB"
  --- NEW Physical volume ---
  PV Name              /dev/loop1
  VG Name             
  PV Size              2.00 GiB
  Allocatable          NO
  PE Size              0 
  Total PE              0
  Free PE              0
  Allocated PE          0
  PV UUID              i3mdBO-9WIc-EO2y-NqRr-z5Oa-ItLS-jbjq0E
 
  "/dev/loop2" is a new physical volume of "2.00 GiB"
  --- NEW Physical volume ---
  PV Name              /dev/loop2
  VG Name             
  PV Size              2.00 GiB
  Allocatable          NO
  PE Size              0 
  Total PE              0
  Free PE              0
  Allocated PE          0
  PV UUID              dEwVuO-a5vQ-ipcH-Rvlt-5zWt-iAB2-2F0XBf
</pre>
 
The third three lines of each PV shows:
* what is the storage device beneath a PV
* the VG it is tied to
* the size of this PV.
''Allocatable'' indicates whether the PV is used to store data. As the PV is not a member of a VG, it cannot not be used (yet) hence the "NO" shown. Another set of information is the lines starting with ''PE''. ''PE'' stands for ''' ''Physical Extents'' ''' (data stripe) and is the finest granularity LVM can manipulate. The size of a PE is "0" here because we have a blank PV however it typically holds 32 MB of data. Following ''PE Size'' are ''Total PE'' which show the the total '''number''' of PE available on this PV and ''Free PE'' the number of PE remaining available for use. ''Allocated PE'' just show the difference between ''Total PE'' and ''Free PE''.
 
The latest line (''PV UUID'') is a unique identifier used internally by LVM to name the PV. You have to know that it exists because it is sometimes useful when having to recover from corruption or do weird things with PV however most of the time you don't have to worry about its existence.
{{fancynote|It is possible to force how LVM handles the alignments on the physical storage. This is useful when dealing with 4K sectors drives that lies on their physical sectors size. Refer to the manual page. }}
 
== Volume group creation ==
 
We have the blank PV at this time but to make them a bit more usable for storage we must tell to LVM how they are grouped to form a VG (storage pool) where LV will be created. A nice aspect of VGs resides in the fact that they are not "written in the stone" once created: you can still add, remove or exchange PV (in the case the device the PV is stored on fails for example) inside a VG at a later time. To create our first volume group named ''vgtest'':
 
<pre>
# vgcreate vgtest /dev/loop0 /dev/loop1 /dev/loop2
  Volume group "vgtest" successfully created
</pre>
 
Just like we did before with PV, we can get a list of what are the VG known by the system. This is done through the command '''vgs''':
 
<pre>
# vgs
  VG    #PV #LV #SN Attr  VSize VFree
  vgtest  3  0  0 wz--n- 5.99g 5.99g
</pre>
 
'''vgs''' show you a tabluar view of information:
* '''VG:''' the name of the VG
* '''#PV:''' the number of PV composing the VG
* '''#LV:''' the number of logical volumes (LV) located inside the VG
* '''Attrs:''' a status field. w, z and n here means that VG is:
** '''w:''' '''w'''ritable
** '''z:''' resi'''z'''able
** '''n:''' using the allocation policy '''''n'''ormal'' (tweaking allocation policies is beyond the scope of this article, we will use the default value ''normal'' in the rest of this article)
* VSize and VFree gives statistics on how full a VG is versus its size
 
Note the dashes in ''Attrs'', they mean that the attribute is not active:
* First dash (3rd position) indicates if the VG would have been exported (a 'x' would have been showed at this position in that case).
* Second dash (4th position) indicates if the VG would have been partial (a 'p' would have been showed at this position in that case).
* Third dash (rightmost position) indicates if the VG is a clustered (a 'c' would have been showed at this position in that case). 
 
Exporting a VG and clustered VG are a bit more advanced aspects of LVM and won't be covered here especially the clustered VGs which are used in the case of a shared storage space used in a cluster of machines. Talking about clustered VGs management in particular would require and entire article in itself. '''For now the only detail you have to worry about those dashes in ''Attrs'' is to see a dash at the 4th position of ''Attrs'' instead of a ''p'''''. Seeing ''p'' there would be a bad news: the VG would have missing parts (PV) making it not usable.
 
{{fancynote|In the exact same manner you can see a detailed information about physical volumes with '''pvdisplay''', you can see detailed information of a volume group with '''vgdisplay'''. We will demonstrate that latter command in the paragraphs to follow.}}
 
Before leaving the volume group aspect, do you remember the '''pvs''' command shown in the previous paragraphs? Try it gain:
 
<pre>
# pvs
  PV        VG    Fmt  Attr PSize PFree
  /dev/loop0 vgtest lvm2 a-  2.00g 2.00g
  /dev/loop1 vgtest lvm2 a-  2.00g 2.00g
  /dev/loop2 vgtest lvm2 a-  2.00g 2.00g
</pre>
 
Now it shows the VG our PVs belong to :-)
 
== Logical volumes creation ==
 
Now the final steps: we will create the storage areas (logical volumes or ''LV'') inside the VG where we will then create filesystems on. Just like a VG has a name, a LV has also a name which is unique in the VG.
 
{{fancynote|Two LV can be given the same name as long as they are located on a different VG.}}
 
To divide our VG like below:
 
* lvdata1: 2 GB
* lvdata2: 1 GB
* lvdata3 : 10% of the VG size
* lvdata4 : All of remaining free space in the VG
 
We use the following commands (notice the capital 'L' and the small 'l' to declare absolute or relative sizes):
 
<pre>
# lvcreate -n lvdata1 -L 2GB vgtest
  Logical volume "lvdata1" created
#  lvcreate -n lvdata2 -L 1GB vgtest
  Logical volume "lvdata2" created
# lvcreate -n lvdata3 -l 10%VG vgtest
  Logical volume "lvdata2" created
</pre>
 
What is going on so far? Let's check with the pvs/vgs counterpart known as '''lvs''':
 
<pre>
# lvs
  LV      VG    Attr  LSize  Origin Snap%  Move Log Copy%  Convert
  lvdata1 vgtest -wi-a-  2.00g                                     
  lvdata2 vgtest -wi-a-  1.00g                                     
  lvdata3 vgtest -wi-a- 612.00m
#
</pre>
 
Notice the size of ''lvdata3'', it is roughly 600MB (10% of 6GB). How much free space remains in the VG? Time to see what '''vgs''' and '''vgdisplay''' returns:
 
<pre>
# vgs
  VG    #PV #LV #SN Attr  VSize VFree
  vgtest  3  3  0 wz--n- 5.99g 2.39g
# vgdisplay
  --- Volume group ---
  VG Name              vgtest
  System ID           
  Format                lvm2
  Metadata Areas        3
  Metadata Sequence No  4
  VG Access            read/write
  VG Status            resizable
  MAX LV                0
  Cur LV                3
  Open LV              0
  Max PV                0
  Cur PV                3
  Act PV                3
  VG Size              5.99 GiB
  PE Size              4.00 MiB
  Total PE              1533
  Alloc PE / Size      921 / 3.60 GiB
  Free  PE / Size      612 / 2.39 GiB
  VG UUID              baM3vr-G0kh-PXHy-Z6Dj-bMQQ-KK6R-ewMac2
</pre>
 
Basically it say we have 1533 PE (chunks) available for a total size of 5.99 GiB. On those 1533, 921 are used (for a size of 3.60 GiB) and 612 remains free (for a size of 2.39 GiB). So we expect to see lvdata4 having an approximative size of 2.4 GiB. Before creating it, have a look at some statistics at the PV level:
 
<pre>
# pvs
  PV        VG    Fmt  Attr PSize PFree 
  /dev/loop0 vgtest lvm2 a-  2.00g      0
  /dev/loop1 vgtest lvm2 a-  2.00g 404.00m
  /dev/loop2 vgtest lvm2 a-  2.00g  2.00g
 
# pvdisplay
  --- Physical volume ---
  PV Name              /dev/loop0
  VG Name              vgtest
  PV Size              2.00 GiB / not usable 4.00 MiB
  Allocatable          yes (but full)
  PE Size              4.00 MiB
  Total PE              511
  Free PE              0
  Allocated PE          511
  PV UUID              b9i1Hi-llka-egCF-2vU2-f7tp-wBqh-qV4qEk
 
  --- Physical volume ---
  PV Name              /dev/loop1
  VG Name              vgtest
  PV Size              2.00 GiB / not usable 4.00 MiB
  Allocatable          yes
  PE Size              4.00 MiB
  Total PE              511
  Free PE              101
  Allocated PE          410
  PV UUID              i3mdBO-9WIc-EO2y-NqRr-z5Oa-ItLS-jbjq0E
 
  --- Physical volume ---
  PV Name              /dev/loop2
  VG Name              vgtest
  PV Size              2.00 GiB / not usable 4.00 MiB
  Allocatable          yes
  PE Size              4.00 MiB
  Total PE              511
  Free PE              511
  Allocated PE          0
  PV UUID              dEwVuO-a5vQ-ipcH-Rvlt-5zWt-iAB2-2F0XBf
</pre>
 
Quite interesting! Did you notice? The first PV is full, the second is more or less full and the third is empty. This is due to the allocation policy used for the VG: it fills its first PV then its second PV and then its third PV (this, by the way, gives you a chance to recover from a dead physical storage if by luck none of your PE was present on it).
 
It is now time to create our last LV, again notice the small 'l' to specify a relative size:
 
<pre>
# lvcreate -n lvdata4 -l 100%FREE vgtest
  Logical volume "lvdata4" created
# lvs
  LV      VG    Attr  LSize  Origin Snap%  Move Log Copy%  Convert
  lvdata1 vgtest -wi-a-  2.00g                                     
  lvdata2 vgtest -wi-a-  1.00g                                     
  lvdata3 vgtest -wi-a- 612.00m                                     
  lvdata4 vgtest -wi-a-  2.39g
</pre>
 
Now the $100 question: if '''pvdisplay''' and '''vgdisplay''' commands exist, does command named '''lvdisplay''' exist as well? Yes absolutely! Indeed the command sets are coherent between abstraction levels (PV/VG/LV)  and they are named in the exact same manner modulo their first 2 letters:
 
* PV: pvs/pvdisplay/pvchange....
* VG: vgs/vgdisplay/vgchange....
* LG: lvs/lvdisplay/lvchange....
 
Back to our '''lvdisplay''' command, here is how it shows up:
 
<pre>
# lvdisplay
  --- Logical volume ---
  LV Name                /dev/vgtest/lvdata1
  VG Name                vgtest
  LV UUID                fT22is-cmSL-uhwM-zwCd-jeIe-DWO7-Hkj4k3
  LV Write Access        read/write
  LV Status              available
  # open                0
  LV Size                2.00 GiB
  Current LE            512
  Segments              2
  Allocation            inherit
  Read ahead sectors    auto
  - currently set to    256
  Block device          253:0
 
  --- Logical volume ---
  LV Name                /dev/vgtest/lvdata2
  VG Name                vgtest
  LV UUID                yd07wA-hj77-rOth-vxW8-rwo9-AX7q-lcyb3p
  LV Write Access        read/write
  LV Status              available
  # open                0
  LV Size                1.00 GiB
  Current LE            256
  Segments              1
  Allocation            inherit
  Read ahead sectors    auto
  - currently set to    256
  Block device          253:1
 
  --- Logical volume ---
  LV Name                /dev/vgtest/lvdata3
  VG Name                vgtest
  LV UUID                ocMCL2-nkcQ-Fwdx-pss4-qeSm-NtqU-J7vAXG
  LV Write Access        read/write
  LV Status              available
  # open                0
  LV Size                612.00 MiB
  Current LE            153
  Segments              1
  Allocation            inherit
  Read ahead sectors    auto
  - currently set to    256
  Block device          253:2
 
  --- Logical volume ---
  LV Name                /dev/vgtest/lvdata4
  VG Name                vgtest
  LV UUID                iQ2rV7-8Em8-85ts-anan-PePb-gk18-A31bP6
  LV Write Access        read/write
  LV Status              available
  # open                0
  LV Size                2.39 GiB
  Current LE            612
  Segments              2
  Allocation            inherit
  Read ahead sectors    auto
  - currently set to    256
  Block device          253:3
</pre>
 
Nothing extremely useful to comment for an overview beyond showing at the exception of two things:
# '''LVs are accessed via the device mapper''' (see the lines starting by ''LV Name'' and notice how the name is composed). So '''lvdata1''' will be accessed via ''/dev/vgtest/lvdata1'', ''lvdata2'' will be accessed via ''/dev/vgtest/lvdata2'' and so on.
# just like PV are managed in sets of data chunks (the so famous Physical Extents or PEs), LVs are managed in a set of data chunks known as Logical Extents or LEs. Most of the time you don't have to worry about the existence of LEs because they fits withing a single PE although it is possible to make them smaller hence having several LE within a single PE. Demonstration: if you consider the first LV, '''lvdisplay''' says it has a size of 2 GiB and holds 512 logical extents. Dividing 2GiB by 512 gives 4 MiB as the size of a LE which is the exact same size used for PEs as seen when demonstrating the '''pvdisplay''' command some paragraphs above. So in our case we have a 1:1 match between a LE and the underlying PE.
 
Oh another great point to underline: you can display the PV in relation with a LV :-) Just give a special option to '''lvdisplay''':
 
<pre>
# lvdisplay -m
  --- Logical volume ---
  LV Name                /dev/vgtest/lvdata1
  VG Name                vgtest
  (...)
  Current LE            512
  Segments              2
  (...)
  --- Segments ---
  Logical extent 0 to 510:
    Type                linear
    Physical volume    /dev/loop0
    Physical extents    0 to 510
 
  Logical extent 511 to 511:
    Type                linear
    Physical volume    /dev/loop1
    Physical extents    0 to 0
 
 
  --- Logical volume ---
  LV Name                /dev/vgtest/lvdata2
  VG Name                vgtest
  (...)
  Current LE            256
  Segments              1
  (...)
 
  --- Segments ---
  Logical extent 0 to 255:
    Type                linear
    Physical volume    /dev/loop1
    Physical extents    1 to 256
 
 
  --- Logical volume ---
  LV Name                /dev/vgtest/lvdata3
  VG Name                vgtest
  (...)
  Current LE            153
  Segments              1
  (...)
 
  --- Segments ---
  Logical extent 0 to 152:
    Type                linear
    Physical volume    /dev/loop1
    Physical extents    257 to 409
 
 
  --- Logical volume ---
  LV Name                /dev/vgtest/lvdata4
  VG Name                vgtest
  (...)
  Current LE            612
  Segments              2
  (...)
 
  --- Segments ---
  Logical extent 0 to 510:
    Type                linear
    Physical volume    /dev/loop2
    Physical extents    0 to 510
 
  Logical extent 511 to 611:
    Type                linear
    Physical volume    /dev/loop1
    Physical extents    410 to 510
</pre>
 
To go one step further let's analyze a bit how the PE are used: the first LV has 512 LEs (remember: one LE fits within one PE here so 1 LE = 1 PE). Amongst those 512 LEs, 511 of them (0 to 510) are stored on /dev/loop0 and the 512th LE is on /dev/loop1. Huh? Something seems to be wrong here, '''pvdisplay''' said that /dev/loop0 was holding 512 PV so why an extent has been placed on the second storage device? Indeed its not a misbehaviour and absolutely normal: LVM uses some metadata internally with regards the PV, VG and LV thus making some of storage space unavailable for the payload. This explains why 1 PE has been "eaten" to store that metadata. Also notice the linear allocation process: ''/dev/loop0'' has been used, then when being full ''/dev/loop1'' has also been used then the turn of /''dev/loop2'' came.
 
Now everything is in place, if you want just check again with '''vgs/pvs/vgdisplay/pvdisplay''' and will notice that the VG is now 100% full and all of the underlying PV are also 100% full.
 
== Filesystems creation  and mounting  ==
 
Now we have our LVs it could be fun if we could do something useful with them. In the case you missed it, LVs are accessed via the device mapper which uses a combination of the VG and LV names thus:
* lvdata1 is accessible via /dev/vgtest/lvdata1
* lvdata2 is accessible via /dev/vgtest/lvdata2
* and so on!
 
Just like any traditional storage device, the newly created LVs are seen as block devices as well just as if they were a kind of harddisk (don't worry about the "dm-..", it is just an internal block device automatically allocated by the device mapper for you):
<pre>
# ls -l /dev/vgtest
total 0
lrwxrwxrwx 1 root root 7 Dec 27 12:54 lvdata1 -> ../dm-0
lrwxrwxrwx 1 root root 7 Dec 27 12:54 lvdata2 -> ../dm-1
lrwxrwxrwx 1 root root 7 Dec 27 12:54 lvdata3 -> ../dm-2
lrwxrwxrwx 1 root root 7 Dec 27 12:54 lvdata4 -> ../dm-3
 
# ls -l /dev/dm-[0-3]
brw-rw---- 1 root disk 253, 0 Dec 27 12:54 /dev/dm-0
brw-rw---- 1 root disk 253, 1 Dec 27 12:54 /dev/dm-1
brw-rw---- 1 root disk 253, 2 Dec 27 12:54 /dev/dm-2
brw-rw---- 1 root disk 253, 3 Dec 27 12:54 /dev/dm-3
</pre>
 
So if LVs are block device a filesystem can be created on them just like if they were a real harddisk or hardisk partitions? Absolutely! Now let's create ext4 filesystems on our LVs:
 
<pre>
# mkfs.ext4 /dev/vgtest/lvdata1
 
mke2fs 1.42 (29-Nov-2011)
Discarding device blocks: done                           
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
131072 inodes, 524288 blocks
26214 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=536870912
16 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912
 
Allocating group tables: done                           
Writing inode tables: done                           
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done
 
# mkfs.ext4 /dev/vgtest/lvdata1
(...)
# mkfs.ext4 /dev/vgtest/lvdata2
(...)
# mkfs.ext4 /dev/vgtest/lvdata3
(..)
</pre>
 
Once the creation ended we must create the mount points and mount the newly created filesystems on them:
 
<pre>
# mkdir /mnt/data-01
# mkdir /mnt/data-02
# mkdir /mnt/data-03
# mkdir /mnt/data-04
# mount /dev/vgtest/lvdata1 /mnt/data01
# mount /dev/vgtest/lvdata2 /mnt/data02
# mount /dev/vgtest/lvdata3 /mnt/data03
# mount /dev/vgtest/lvdata4 /mnt/data04
</pre>
 
Finally we can check that everything is in order:
 
<pre>
# df -h
Filesystem                    Size  Used Avail Use% Mounted on
(...)
/dev/mapper/vgtest-lvdata1    2.0G  96M  1.9G  5% /mnt/data01
/dev/mapper/vgtest-lvdata2  1022M  47M  924M  5% /mnt/data02
/dev/mapper/vgtest-lvdata3    611M  25M  556M  5% /mnt/data03
/dev/mapper/vgtest-lvdata4    2.4G  100M  2.2G  5% /mnt/data04
</pre>
 
Did you notice the device has changed? Indeed everything is in order, mount just uses another set of symlinks which point to the exact same block devices:
 
<pre>
# ls -l /dev/mapper/vgtest-lvdata[1-4]
lrwxrwxrwx 1 root root 7 Dec 28 20:12 /dev/mapper/vgtest-lvdata1 -> ../dm-0
lrwxrwxrwx 1 root root 7 Dec 28 20:13 /dev/mapper/vgtest-lvdata2 -> ../dm-1
lrwxrwxrwx 1 root root 7 Dec 28 20:13 /dev/mapper/vgtest-lvdata3 -> ../dm-2
lrwxrwxrwx 1 root root 7 Dec 28 20:13 /dev/mapper/vgtest-lvdata4 -> ../dm-3
</pre>
 
== Renaming a volume group and its logical volumes ==
 
So far we have four LVs named lvdata1 to lvdata4 mounted on /mnt/data01 to /mnt/data04. It would be more adequate to :
# make the number in our LV names being like "01" instead of "1"
# rename our volume groupe to "vgdata" instead of "vgtest"
 
To show how dynamic is the LVM world, we will rename our VG and LV on the fly using two commands: '''vgrename''' for acting at the VG level and its counterpart '''lvrename''' to act at the LV level. Starting by the VG or the LVs makes strictly no difference, you can start either way and get the same result. In our example we have chosen to start with the VG:
 
<pre>
# vgrename vgtest vgdata
  Volume group "vgtest" successfully renamed to "vgdata"
# lvrename vgdata/lvdata1 vgdata/lvdata01
  Renamed "lvdata1" to "lvdata01" in volume group "vgdata"
# lvrename vgdata/lvdata2 vgdata/lvdata02
  Renamed "lvdata2" to "lvdata02" in volume group "vgdata"
# lvrename vgdata/lvdata3 vgdata/lvdata03
  Renamed "lvdata3" to "lvdata03" in volume group "vgdata"
# lvrename vgdata/lvdata4 vgdata/lvdata04
  Renamed "lvdata4" to "lvdata04" in volume group "vgdata"
</pre>
 
What happened? Simple:
 
<pre>
# vgs
  VG    #PV #LV #SN Attr  VSize VFree
  vgdata  3  4  0 wz--n- 5.99g    0
# lvs
  LV      VG    Attr  LSize  Origin Snap%  Move Log Copy%  Convert
  lvdata01 vgdata -wi-ao  2.00g                                     
  lvdata02 vgdata -wi-ao  1.00g                                     
  lvdata03 vgdata -wi-ao 612.00m                                     
  lvdata04 vgdata -wi-ao  2.39g
</pre>
 
Sounds good, our VG and LVs have been renamed! What a command like ''mount'' will say?
 
<pre>
# mount
(...)
/dev/mapper/vgtest-lvdata1 on /mnt/data01 type ext4 (rw)
/dev/mapper/vgtest-lvdata2 on /mnt/data02 type ext4 (rw)
/dev/mapper/vgtest-lvdata3 on /mnt/data03 type ext4 (rw)
/dev/mapper/vgtest-lvdata4 on /mnt/data04 type ext4 (rw)
</pre>
 
Ooops... It is not exactly a bug, mount still shows the symlinks used at the time the LVs were mounted in the VFS and has not updated its information. However once again everything is correct because the underlying  block devices (/dev/dm-0 to /dev/dm-3) did not changed at all. To see the right information the LVs must be unmounted and mounted again:
 
<pre>
# umount /mnt/data01
(...)
# umount /mnt/data04
# mount /dev/vgdata/lvdata01 /mnt/data01
(...)
# mount /dev/vgdata/lvdata04 /mnt/data04
# mount
/dev/mapper/vgdata-lvdata01 on /mnt/data01 type ext4 (rw)
/dev/mapper/vgdata-lvdata02 on /mnt/data02 type ext4 (rw)
/dev/mapper/vgdata-lvdata03 on /mnt/data03 type ext4 (rw)
/dev/mapper/vgdata-lvdata04 on /mnt/data04 type ext4 (rw)
</pre>
 
{{fancynote|Using /dev/''volumegroup''/''logicalvolume'' or /dev/''volumegroup''-''logicalvolume'' makes no difference at all, those are two sets of symlinks pointing on the '''exact''' same block device. }}
 
= Expanding and shrinking the storage space  =
 
Did you notice in the previous section we have never talked on topic like "create this partition at the beginning" or "allocate 10 sectors more". In LVM you do not have to worry about that kind of problematics: your only concern is more "Do I have the space to allocate a new LV or how can I extend an existing LV?". '''LVM takes cares of the low levels aspects for you, just focus on what you want to do with your storage space.'''
 
The most common problem with computers is the shortage of space on a volume, most of the time production servers can run months or years without requiring a reboot for various reasons (kernel upgrade, hardware failure...) however they regularly requires to extend their storage space because we do generate more and more data as the time goes. With "traditional" approach like fiddling directly with hard drives partitions, storage space manipulation can easily become a headache mainly because it requires coherent copy to be made and thus application downtimes. Don't expect the situation to be more enjoyable with a SAN storage rather a directly attached storage device... Basically the problems remains the same.
 
== Expanding a storage space ==
 
The most common task for a system administrator is to expand the available storage space. In the LVM world this implies:
* Creating a new PV
* Adding the PV to the VG (thus extending the VG capacity)
* Extending the existing LVs or create new ones
* Extending the structures of the filesystems located on a LV in the case a LV is extended (Not all of the filesystems around support that capability).
 
=== Bringing a new PV in the VG ===
 
In the exact same manner we have created our first PV let's create our additional storage device, associate it to a loopback device and then create a PV on it:


<pre>
<pre>
DEPEND="user/lighttpd group/web-server"
# dd if=/dev/zero of=/tmp/hdd4.img bs=2G count=1
RDEPEND="user/lighttpd group/web-server"
# losetup /dev/loop3 /tmp/hdd4.img
# pvcreate /dev/loop3
</pre>
</pre>


All this tells Portage is that "This ebuild needs a <tt>lighttpd</tt> user and <tt>web-server</tt> group." But it does not tell Portage what UID it should be, nor does it provide other necessary settings for the user. This data is defined within the Portage tree, and the mechanism for defining this data is described below.
A '''pvs''' should report the new PV with 2 GB of free space:


== Profile Settings ==
<pre>
# pvs
  PV        VG    Fmt  Attr PSize PFree
  /dev/loop0 vgdata lvm2 a-  2.00g    0
  /dev/loop1 vgdata lvm2 a-  2.00g    0
  /dev/loop2 vgdata lvm2 a-  2.00g    0
  /dev/loop3        lvm2 a-  2.00g 2.00g
</pre>


The user or group dependency will just tell Portage that this particular package requires a particular user or group, but any detailed information related to this user or group, such as suggested UID/GID, shell, etc, is stored in the Portage tree itself, and specifically in the Portage ''profile''. The mechanism for defining this information is described below:
Excellent! The next step consist of adding this newly created PV inside our VG ''vgdata'', this is where the '''vgextend''' command comes at our rescue:


=== Core Portage Trees ===
<pre>
# vgextend vgdata /dev/loop3
  Volume group "vgdata" successfully extended
# vgs
  VG    #PV #LV #SN Attr  VSize VFree
  vgdata  4  4  0 wz--n- 7.98g 2.00g
</pre>


For "core" Portage trees (not overlays,) specific user and group settings are defined using Portage's ''cascading profile'' functionality. Portage would be enhanced to recognize <tt>accounts/users</tt> and <tt>accounts/groups</tt> directories inside profile directories. Users and groups would be defined in these directories, with one user or group per file, and the filename specifying the name of the user or group. Cascading functionality would be enabled so that the full set of user and group data could be a collection of all users and groups defined in parent profiles. This would provide a handy mechanism to share user and group definitions across different operating systems, while allowing for local variations when needed. It makes sense to leverage cascading profiles as much as possible.
Great, ''vgdata'' is now 8 GB large instead of 6 GB and have 2 GB of free space to allocate to either new LVs either existing LVs.


=== Overlays ===
=== Extending the LV and its filesystem ===


The approach described above does not work for overlays -- how are they to extend user and group settings automatically, as required by the ebuilds contained in the overlay?
Bringing new LV would demonstrate nothing more nevertheless extending our existing LVs is much more interesting. How can we use our 2GB extra free space? We can, for example, split it in two allocating a 50% to our first (''lvdata01'') and third (''lvdata03'') LV adding 1GB of space to both. The best of the story is that operation is very simple and is realized with a command named '''lvextend''':
 
<pre>
# lvextend vgdata/lvdata01 -l +50%FREE
  Extending logical volume lvdata01 to 3.00 GiB
  Logical volume lvdata01 successfully resized
# lvextend vgdata/lvdata03 -l +50%FREE
  Extending logical volume lvdata03 to 1.10 GiB
  Logical volume lvdata03 successfully resized
</pre>
 
Ouaps!! We did a mistake there: lvdata01 has the expected size (2GB + 1GB for a grand total of 3 GB) but lvdata03 only grown of 512 MB (for a grand total size of 1.1 GB). Our mistake was obvious: once the first gigabyte (50% of 2GB) of extra space has been given to lvdata01, only one gigabyte remained free on the VG thus when we said "allocate 50% of the remaining gigabyte to ''lvdata03''" LVM added only 512 MB leaving the other half of this gigabyte unused. The '''vgs''' command can confirm this:
 
<pre>
# vgs
  VG    #PV #LV #SN Attr  VSize VFree 
  vgdata  4  4  0 wz--n- 7.98g 512.00m
</pre>
 
Nevermind about that voluntary mistake we will keep that extra space for a later paragraph :-) What happened to the storage space visible from the operating system?
 
<pre>
# df -h | grep lvdata01
/dev/mapper/vgdata-lvdata01  2.0G  96M  1.9G  5% /mnt/data01
</pre>
 
Obviously resizing a LV does not "automagically" resize the filesystem structures to take into account the new LV size making that step part of our duty. Happily for us, ext3 can be resized and better it can be grown when mounted in the VFS. This is known as ''online resizing'' and a few others filesystems supports that capability, among them we can quote ext2 (ext3 without a journal), ext4 (patches integrated very recently as of Nov/Dec 2011), XFS, ResiserFS and BTRFS. To our knowledge, only BTRFS support both online resizing '''and''' online shrinking as of Decembrer 2011, all of the others require a filesystem to be unmounted first before being shrunk.
 
{{fancynote|Consider using the option -r when invoking lvextend, it asks the command to perform a filesystem resize.}}
 
Now let's extend (grow) the ext3 filesystem located on lvdata01. As said above, ext3 support online resizing hence we do not need to kick it out of the VFS first:
 
<pre>
# resize2fs /dev/vgdata/lvdata01
resize2fs 1.42 (29-Nov-2011)
Filesystem at /dev/vgdata/lvdata01 is mounted on /mnt/data01; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 1
Performing an on-line resize of /dev/vgdata/lvdata01 to 785408 (4k) blocks.
The filesystem on /dev/vgdata/lvdata01 is now 785408 blocks long.


The proposed solution is to allow overlays to add users and groups via the <tt>OVERLAY_DIR/profiles/accounts/groups</tt> and <tt>PORTDIR/profiles/accounts/users</tt> directories. These directories will ''always'' be searched for user and group data for all active overlays, and merged into the set defined by the profiles. This provides an automatic mechanism for overlays to inject user and group data that they require, without requiring any manual configuration on behalf of the Gentoo/Funtoo Linux user.
# df -h | grep lvdata01
/dev/mapper/vgdata-lvdata01  3.0G  96M  2.8G  4% /mnt/data01
</pre>


This way, Portage can have elegant overlay support inherent in the Exheres "global repository of user/group data" design, while still having an extensible mechanism to define users and groups using cascading profiles. In my opinion, this is the best of both worlds.
''Et voila!'' Our  LV has now plenty of new space usable :-) '''We do not bother about ''how'' the storage is organized by LVM amongst the underlying storage devices and it is not our problem after all. We only worry about having our storage requirements being satisfied without any further details. From our point of view everything is seen just as if we were manipulating a single storage device subdivided in several partitions of a dynamic size and always organized in a set of contiguous blocks.'''


=== Account Resolution ===
Now let's shuffle the cards a bit more: when we examined how the LEs of our LVs were allocated, we saw that ''lvdata01'' (named lvdata1 at this time) consisted of 512 LEs or 512 PEs (because of the 1:1 mapping between those)  spread over two PVs. As we have extended it to use an additional PV, we should see it using 3 segments:


See the following pseudo-code for how resolution of cascading profiles and overlays should work together to resolve user settings. One important thing to note is that user and group resolution cascades through the profiles to create a master list of users, groups and defaults. This master list is extended by any overlays that are active. Then, when user or group data is requested, the resolved user, group and defaults lists are used to generate the resultant data.
* Segment 1: located on the PV stored on /dev/loop0 (LE/PE #0 to #510)
* Segment 2: located on the PV stored on /dev/loop1 (LE/PE #511)
* Segment 3: located on the PV stored on /dev/loop1 (LE/PE #512 and followers)


'''Users pseudo-code, with Groups being implemented identically:'''
Is it the case? Let's check:


<pre>
<pre>
class Profile:
# lvdisplay -m  vgdata/lvdata01
  --- Logical volume ---
  LV Name                /dev/vgdata/lvdata01
  VG Name                vgdata
  LV UUID                fT22is-cmSL-uhwM-zwCd-jeIe-DWO7-Hkj4k3
  LV Write Access        read/write
  LV Status              available
  # open                1
  LV Size                3.00 GiB
  Current LE            767
  Segments              3
  Allocation            inherit
  Read ahead sectors    auto
  - currently set to    256
  Block device          253:0
 
  --- Segments ---
  Logical extent 0 to 510:
    Type                linear
    Physical volume    /dev/loop0
    Physical extents    0 to 510
 
  Logical extent 511 to 511:
    Type                linear
    Physical volume    /dev/loop1
    Physical extents    0 to 0
 
  Logical extent 512 to 766:
    Type                linear
    Physical volume    /dev/loop3
    Physical extents    0 to 254
</pre>
 
Bingo! Note that if it is true here (LVM uses linear allocation) would not be true in the general case.


  def __init__(self,path):
{{fancywarning|'''Never mix a local storage device with a SAN disk within the same volume group''' and especially if that later is your system volume. It will bring you a lot of troubles if the SAN disk goes offline or bring weird performance fluctuations as PEs allocated on the SAN will get faster response times than those located on  a local disk. }}
    self.path = path
    self._processed_user_defaults = False
    self._required_user_fields = []
    self._alternate_user_fields = {}
    self.parents = []
    # sample code to recursively create Parent profiles:
    if os.path.exists("%s/parents" % self.path):
      a=open("%s/parents" % self.path,"r")
      for line in a:
        self.parents.append(Profile(self.resolve_path(line)))
      a.close()


  @property
== Shrinking a storage space ==
  def users(self):
    """ returns a dictionary mapping user names to the files on disk defining each user (cascading) """
    users = {}
    for parent in self.parents:
      users.update(self.parent.users)
    for userfile in glob.glob("accounts/users/*"):
      users[os.path.dirname(userfile)] = os.path.abspath(userfile)
    for overlay in self.overlays:
      users.update(self.overlay.users)
    return users


  def userData(self,user):
On some occasions it can be useful to reduce the size of a LV or the size of the VG itself. The principle is similar to what has been demonstrated in the previous section:
    """ returns a dictionary of key/value pairs defining the variables for specified user. Note:
        * alternative key names are mapped to primary key names
        * an exception is thrown if required fields are missing
    """
    out = {}
    if user in self.users:
      user_data = grabFile(self.users[user])
      out = grabFile(self.defaults["user"])
      required = []
      alternatives = {}
      if "required" in out:
        for req_key in out["required"].split(','):
          alts = req_key.split('|')
          required.append(alts[0])
          if len(alts) > 1:
          for alt_key in alts[1:]:
            alternatives[alt_key] = alts[0] 
      if "parent" in user_data:
        # note, this next line requires a grabFile() implementation that supports alternatives, and
        # will use this dict to map any alternative names to the primary name in the return data:
        out.update(grabFile(self.defaults[user_data["parent"]],alternatives=alternatives))
        out.update(user_data,alternatives=alternatives)
    for req_key in required:
      if not req_key in out:
        raise RequiredKeyError(user,req_key)
    return out


  @property
# umount the filesystem belong to the LV to be processed (if your filesystem does not support online shrinking)
  def defaults(self):
# reduce the filesystem size (if the LV is not to be flushed)
    """ returns a dictionary mapping defaults names to the files on disk defining each default (cascading) """
# reduce the LV size - OR - remove the LV
    defaults = {}
# remove a PV from the volume group if no longer used to store extents
    for parent in self.parents:
      defaults.update(self.parent.defaults)
    for defaultsfile in glob.glob("accounts/defaults/*"):
      defaults[os.path.dirname(defaultsfile)] = os.path.abspath(defaultsfile)
    for overlay in self.overlays:
      defaults.update(overlay.user_defaults)
    return defaults


profile = Profile("/etc/make.profile")
The simplest case to start with is how a LV can be removed: a good candidate for removal is ''lvdata03'', we failed to resize it and the better would be to scrap it. First unmount it:
my_user = profile.userData("nginx")
print my_user["desc"]


<pre>
# lvs
  LV      VG    Attr  LSize Origin Snap%  Move Log Copy%  Convert
  lvdata01 vgdata -wi-ao 3.00g                                     
  lvdata02 vgdata -wi-ao 1.00g                                     
  lvdata03 vgdata -wi-ao 1.10g                                     
  lvdata04 vgdata -wi-ao 2.39g                                     
# umount /dev/vgdata/lvdata03
# lvs
  LV      VG    Attr  LSize Origin Snap%  Move Log Copy%  Convert
  lvdata01 vgdata -wi-ao 3.00g                                     
  lvdata02 vgdata -wi-ao 1.00g                                     
  lvdata03 vgdata -wi-a- 1.10g                                     
  lvdata04 vgdata -wi-ao 2.39g
</pre>
</pre>


=== User and Group Data Format ===
Noticed the little change with '''lvs'''? It lies in the ''Attr'' field: once the ''lvdata03'' has been unmounted, '''lvs''' tells us the LV is not '''o'''pened anymore (the little o at the rightmost position has been replaced by a dash). The LV still exists but nothing is using it.


==== Users ====
To remove ''lvdata03'' use the command '''lvremove''' and confirm the removal by entering 'y' when asked:
 
<pre>
# lvremove vgdata/lvdata03
Do you really want to remove active logical volume lvdata03? [y/n]: y
  Logical volume "lvdata03" successfully removed
# lvs
  LV      VG    Attr  LSize Origin Snap%  Move Log Copy%  Convert
  lvdata01 vgdata -wi-ao 3.00g                                     
  lvdata02 vgdata -wi-ao 1.00g                                     
  lvdata04 vgdata -wi-ao 2.39g
# vgs
  VG    #PV #LV #SN Attr  VSize VFree
  vgdata  4  3  0 wz--n- 7.98g 1.60g
</pre>


In a given profile directory, <tt>accounts/users/'''myuser'''</tt> will define settings for a user with the name of <tt>myuser</tt>. The file format used to define users is very similar to and compatible with Exheres, using standard <tt>make.conf</tt>-style key=value syntax, with quoting required for values with whitespace. The following field names are suggested to be used for the initial users implementation. Note that this file format is extensible -- Portage must not complain about any additional fields in the users, groups or defaults files that are not specified above. This allows these formats to be easily extended for alternate operating systems or other distributions without requiring patches to Portage.
Notice the 1.60 of space has been freed in the VG. What can we do next? Shrinking ''lvdata04'' by 50% giving roughly 1.2GB or 1228MB (1.2*1024) of its size could be a good idea so here we go. First we need to umount the filesystem from the VFS because ext3 '''does not support''' online shrinking.


{| {{table}}
<pre>
! Name
# umount /dev/vgdata/lvdata04
! Alternate Name
# e2fsck -f /dev/vgdata/lvdata04
! Description
e2fsck 1.42 (29-Nov-2011)
! Example
Pass 1: Checking inodes, blocks, and sizes
! Notes
Pass 2: Checking directory structure
|-
Pass 3: Checking directory connectivity
|<tt>shell</tt>
Pass 4: Checking reference counts
|N/A
Pass 5: Checking group summary information
|login shell
/dev/vgdata/lvdata04: 11/156800 files (0.0% non-contiguous), 27154/626688 blocks
|<tt>/bin/bash</tt>
# resize2fs -p /dev/vgdata/lvdata04 -L 1228M
|
# lvreduce /dev/vgdata/lvdata04 -L 1228
|-
  WARNING: Reducing active logical volume to 1.20 GiB
|<tt>home</tt>
  THIS MAY DESTROY YOUR DATA (filesystem etc.)
|N/A
Do you really want to reduce lvdata04? [y/n]: y
|home directory
  Reducing logical volume lvdata04 to 1.20 GiB
|<tt>/dev/null</tt>
  Logical volume lvdata04 successfully resized
|
oxygen ~ # e2fsck -f /dev/vgdata/lvdata04
|-
e2fsck 1.42 (29-Nov-2011)
|<tt>group</tt>
Pass 1: Checking inodes, blocks, and sizes
|<tt>primary_group</tt>
Pass 2: Checking directory structure
|primary group
Pass 3: Checking directory connectivity
|<tt>wheel</tt>
Pass 4: Checking reference counts
|
Pass 5: Checking group summary information
|-
/dev/vgdata/lvdata04: 11/78400 files (0.0% non-contiguous), 22234/314368 blocks
|<tt>extra_groups</tt>
</pre>
|N/A
|other group memberships
|<tt>"audio,cdrom"</tt>
|''comma-delimited list''
|-
|<tt>uid</tt>
|<tt>preferred_uid</tt>
|preferred user ID (not guaranteed)
|<tt>37</tt>
|Will be bound by <tt>SYS_UID_MIN</tt> and <tt>SYS_UID_MAX</tt> defined in <tt>/etc/login.defs</tt>?
|-
|<tt>desc</tt>
|<tt>gecos</tt>
|Description/GECOS field
|<tt>"An account for fun"</tt>
|
|-
|<tt>parent</tt>
|N/A
|parent default file
|<tt>user-server</tt>
|
|}


Example file <tt>accounts/users/foo</tt>:
Not very practical indeed, we can tell '''lvreduce''' to handle the underlying filesystem shrinkage for us. Let's shrink again this time giving a 1 GB volume (1024 MB) in absolute size:


<pre>
<pre>
shell=/bin/bash
# lvreduce /dev/vgdata/lvdata04 -r -L 1024
home=/dev/null
fsck from util-linux 2.20.1
group=foo
/dev/mapper/vgdata-lvdata04: clean, 11/78400 files, 22234/314368 blocks
extra_groups="foo bar oni"
resize2fs 1.42 (29-Nov-2011)
uid=37
Resizing the filesystem on /dev/mapper/vgdata-lvdata04 to 262144 (4k) blocks.
desc="The cool account"
The filesystem on /dev/mapper/vgdata-lvdata04 is now 262144 blocks long.
 
  Reducing logical volume lvdata04 to 1.00 GiB
  Logical volume lvdata04 successfully resized
# lvs
  LV      VG    Attr  LSize Origin Snap%  Move Log Copy%  Convert
  lvdata01 vgdata -wi-ao 3.00g                                     
  lvdata02 vgdata -wi-ao 1.00g                                     
  lvdata04 vgdata -wi-a- 1.00g
</pre>
</pre>


==== Groups ====
{{fancynote|Notice the number of 4k blocks shown: 4096*262144/1024^2 gives 1,073,741,824 bytes either 1 GB.}}


* <tt>accounts/groups/'''mygroup'''</tt> will define settings a group with the name of <tt>mygroup</tt>.
Time to mount the volume again:


==== Defaults ====
<pre>
# mount /dev/vgdata/lvdata04 /mnt/data04
# df -h | grep lvdata04
/dev/mapper/vgdata-lvdata04  1021M  79M  891M  9% /mnt/data04
</pre>


The UID/GID management framework supports the ability to explicitly define default values for all users and groups, or a subset of users and groups. In addition, these default values can be overridden by child profiles. This functionality allows default values to be overridden, and also provides a mechanism for profiles to specify which fields are required for that profile. This allows alternate platforms to have different required values, and also allows different Gentoo-based distributions to have different policies regarding required fields. This allows policy to be defined per distribution rather than being hard-coded into Portage itself.
And what is going on at the VG level?


Defaults can be defined inside the <tt>accounts/defaults</tt> directory inside each profile directory. The file <tt>accounts/defaults/user</tt>, if it exists, will be used to define any default settings for user accounts. The file <tt>accounts/defaults/group</tt>, if it exists, will be used to define any default settings for group accounts. These files are typically defined ''in one location'' for an entire set of cascading profiles, such as <tt>profiles/base</tt>.
<pre>
# vgs
  VG    #PV #LV #SN Attr  VSize VFree
  vgdata  4  3  0 wz--n- 7.98g 2.99g
</pre>


Defaults files consist of key=value pairs, identical to user and group files. Note that the <tt>parent</tt> keyword is not valid in defaults files. A new keyword <tt>required</tt> specifies the required fields for any child users or groups, and may only be specified in the master defaults file 'user' or 'group':
Wow, we have near 3 GB of free space inside, a bit more than one of our PV. It could be great if we can free one of the those and of course LVM gives you the possibility to do that. Before going further, let's check what happened at the PVs level:


{| {{table}}
<pre>
! Name
# pvs   
! Description
  PV        VG    Fmt  Attr PSize PFree 
! Example
  /dev/loop0 vgdata lvm2 a-  2.00g      0
! Required
  /dev/loop1 vgdata lvm2 a-   2.00g 1016.00m
! Default
  /dev/loop2 vgdata lvm2 a-  2.00g 1020.00m
! Notes
  /dev/loop3 vgdata lvm2 a-  2.00g    1.00g
|-
</pre>
|<tt>required</tt>
|Required fields
|<tt>"shell,home,desc<nowiki>|</nowiki>gecos"</tt>
|No
|''None''
|''comma-delimited list'', with "<tt><nowiki>|</nowiki></tt>" used to specify alternate names
|}


==== Alternate Defaults ====
Did you noticed? 1 GB of space has been freed on the last PV (/dev/loop3) since ''lvdata04'' has been shrunk not counting the space freed on ''/dev/loop1'' and ''/dev/loop2'' after the removal of lvdata02.


In addition, other files in <tt>defaults</tt> can be created, and these files may be used to specify alternate default settings for users and groups, which can be overridden by child profiles. For example, an <tt>accounts/users/foo</tt> file that contains a <tt>parent=user-server</tt> would use the file <tt>accounts/defaults/user-server</tt> for its inherited default settings. The suggested convention for <tt>defaults</tt> values is to prefix user defaults with "<tt>user-</tt>" and group defaults with "<tt>group-</tt>", but this convention must not be enforced by Portage.


Any defaults files can be overridden by child profiles, which will result in the respective default settings changing for all users and groups that use those defaults.
Next steo: can we remove a PV directly (the command to remove a PV from a VG is '''vgreduce''')?


==== Defaults Parsing Rules ====
<pre>
# vgreduce vgdata /dev/loop0
  Physical volume "/dev/loop0" still in use
</pre>


Note that all alternate defaults files (such as <tt>user-server</tt>) always inherit (and optionally override) the global defaults defined in <tt>user</tt> and <tt>group</tt>. This means that a <tt>required</tt> setting defined in <tt>user</tt> will be inherited by <tt>user-server</tt> automatically. This allows the <tt>required</tt> field for users to be set globally in <tt>user</tt>, and makes it possible to override it easily, by simply providing a new <tt>user</tt> file in a child profile.
Of course not, all of our PVs supports the content of our LVs and we must find a manner to move all of the PE (physical extents) actually hold by the PV /dev/loop0 elsewhere withing the VG. But wait a minute, the victory is there yet: we do have some free space in the /dev/loop0 and we will get more and more free space in it as the displacement process will progress. What is going to happen if, from a concurrent session, we create others LV in ''vgdata'' at the same time the content of  /dev/loop0 is moved? Simple: it can be filled again with the PEs newly allocated.


* A default setting defined in <tt>user</tt> or <tt>group</tt> can be ''unset'' by setting it to a value of <tt>""</tt>.
So before proceeding to the displacement of what ''/dev/loop0'' contents, we must say to LVM: "please don't allocate anymore PEs on ''/dev/loop0''". This is achieved via the parameter ''-x'' of the command '''pvchange''':
* Non-required fields that have not been explicitly defined have a default value of <tt>""</tt> (the empty string).
<pre>
* Required fields that are unset or have a value of <tt>""</tt> should not be allowed and should be flagged as invalid by Portage.
# pvchange -x n /dev/loop0
  Physical volume "/dev/loop0" changed
  1 physical volume changed / 0 physical volumes not changed
</pre>


=== User and Group Creation ===
The value ''n'' given to ''-x'' marks the PV as ''unallocable'' (i.e. not usable for future PE allocations). Let's check again the PVs with '''pvs''' and '''pvdisplay''':


The commands actually used by Portage to create users and groups need to be able to be customizable, as they vary by operating system.
<pre>
# pvs
  PV        VG    Fmt  Attr PSize PFree 
  /dev/loop0 vgdata lvm2 --  2.00g      0
  /dev/loop1 vgdata lvm2 a-  2.00g 1016.00m
  /dev/loop2 vgdata lvm2 a-  2.00g 1020.00m
  /dev/loop3 vgdata lvm2 a-  2.00g    1.00g


Here are some possible mechanisms to implement this functionality, listed in order of personal preference:
# pvdisplay /dev/loop0
  --- Physical volume ---
  PV Name              /dev/loop0
  VG Name              vgdata
  PV Size              2.00 GiB / not usable 4.00 MiB
  Allocatable          NO
  PE Size              4.00 MiB
  Total PE              511
  Free PE              0
  Allocated PE          511
  PV UUID              b9i1Hi-llka-egCF-2vU2-f7tp-wBqh-qV4qEk
</pre>


# Add a <tt>plugins</tt> directory to profiles and create <tt>user-add</tt> and <tt>group-add</tt> scripts within these directories. This allows the <tt>user-add</tt> and <tt>group-add</tt> scripts to be different between MacOS X and Linux, for example, while allowing common platforms to re-use existing scripts. Users could override the user-creation behavior by creating <tt>/etc/portage/plugins/user-add</tt> script.
Great news here, the ''Attrs'' field shows a dash instead of 'a' at the leftmost position meaning the PV is effectively ''not allocatable''. However '''marking a PV not allocatable does not wipe the existing PEs stored on it'''. In other words, it means that data present on the PV remains '''absolutely intact'''. Another positive point lies the remaining capacities of the PVs composing ''vgdata'': the sum of free space available on ''/dev/loop1'', ''/dev/loop2'' and ''/dev/loop3'' is 3060MB (1016MB + 1020MB + 1024MB) so largely sufficient to hold the 2048 MB (2 GB) actually stored on the PV ''/dev/loop0''.
# Add <tt>virtual/user-manager</tt> to every system profile which would install <tt>user-add</tt> and <tt>group-add</tt> commands to a Portage plug-in directory. These commands would be used for creating all users and groups on the system, would have a defined command-line API, and could vary based on OS by tweaking the virtual in the system profile.
# Add internal logic to Portage for adding groups and users to various operating systems. I think this solution would be sub-optimal as it is less "tweakable". User and group creation is something that can be useful to tweak in various circumstances, especially by power users.


=== Migration ===
Now we have frozen the allocation of PEs on /dev/loop0 we can make LVM move all of PEs located in this PV on the others PVs composing the VG ''vgdata''. Again, we don't have to worry about the gory details like where LVM will precisely relocate the PEs actually hold by ''/dev/loop0'', our '''only''' concerns is to get all of them moved out of ''/dev/loop0''. That job gets done by:
 
<pre>
# pvmove /dev/loop0
  /dev/loop0: Moved: 5.9%
  /dev/loop0: Moved: 41.3%
  /dev/loop0: Moved: 50.1%
  /dev/loop0: Moved: 100.0%
</pre>


What remains to be defined is how to transition from <tt>enewgroup</tt> and <tt>enewuser</tt> that are currently being called from <tt>pkg_setup</tt>. The new implementation should be backwards-compatible with the old system to ease transition.
We don't have to tell LVM the VG name because it already knows that ''/dev/loop0'' belongs to ''vgdata'' and what are the others PVs belonging to that VG usable to host the PEs coming from ''/dev/loop0''. It is absolutely normal for the process to takes some minutes (real life cases can go up to several hours even with SAN disks located on high-end storage hardware which is much more faster than local SATA or even SAS drive).  


Options:
At the end of the moving process, we can see that the PV ''/dev/loop0'' is totally free:


# call <tt>pkg_setup</tt> during dependency generation and use <tt>enewgroup</tt> and <tt>enewuser</tt> wrappers to inject dependency info into the metadata, and emit a deprecation warning. Pass only the user/group name to the new system, which would provide its own UID/GID info. This may not be feasible.
<pre>
# brute-force - grep the ebuild for legacy commands during metadata generation. Integrate new-style dependencies into metadata. This is possibly the least elegant solution but may be the simplest approach.
# pvs
# fallback - tweak the legacy commands to call the new framework. This means that older ebuilds would not be able to have their users and groups created at the same time as new-style ebuilds (dependency fulfillment time.) However, this may be the most elegant solution and also the least hackish.
  PV        VG    Fmt  Attr PSize PFree 
  /dev/loop0 vgdata lvm2 a-  2.00g    2.00g
  /dev/loop1 vgdata lvm2 a-  2.00g 1016.00m
  /dev/loop2 vgdata lvm2 a-   2.00g      0
  /dev/loop3 vgdata lvm2 a-   2.00g      0


The last option seems best.
# pvdisplay /dev/loop0
  --- Physical volume ---
  PV Name              /dev/loop0
  VG Name              vgdata
  PV Size              2.00 GiB / not usable 4.00 MiB
  Allocatable          yes
  PE Size              4.00 MiB
  Total PE              511
  Free PE              511
  Allocated PE          0
  PV UUID              b9i1Hi-llka-egCF-2vU2-f7tp-wBqh-qV4qEk
</pre>


=== Architecture ===
511 PEs free out of a maximum 511 PEs so all of its containt has been successfully spread on the others PVs (the volume is also still marked as "unallocatable", this is normal). Now it is ready to be detached from the VG ''vgdata'' with the help of '''vgreduce''' :


Here are the various architectural layers of the implementation:
<pre>
# vgreduce vgdata /dev/loop0
  Removed "/dev/loop0" from volume group "vgdata"
</pre>


# Portage internals to handle "user/" and "group/" as special words. Would be treated almost identically to ebuilds up until actual merge time. Version specifiers, as well as USE flags, would not be allowed.
What happened to ''vgdata''?
# Python-based code to parse user and group data in the profiles, and determine proper UID/GID to use on the system. This is the parsing and policy framework, and can be controlled by variables defined in <tt>make.conf</tt>/<tt>make.defaults</tt>. This would all be written in Python and integrated into the Portage core.
<pre>
## "Core" Portage trees would use cascading profiles to define users and groups. This would allow variations based on architecture (Portage on MacOS X vs. Linux, for example.)
# vgs
## Overlays would use <tt>OVERLAY_DIR/profiles/users</tt> and <tt>OVERLAY_DIR/profiles/groups</tt> to define user and group information required for the overlay. This way, overlays could extend users and groups.
  VG    #PV #LV #SN Attr  VSize VFree 
# Python-based backwards-compatibility code (implementation to be determined)
  vgdata  3  3  0 wz--n- 5.99g 1016.00m
# Profile-based plugin architecture, again python-based.
</pre>
# <tt>user-add</tt> and <tt>group-add</tt> scripts, implemented as stand-alone executables (likely written as a shell script.) This is the only part not in python and these scripts do not do any kind of high-level policy decisions. They simply create the user or group and report success or failure.


=== Possible Changes and Unresolved Issues ===
Its storage space falls to ~6GB! What would tell '''pvs'''?


==== Disable User/Group Creation ====
<pre>
# pvs
  PV        VG    Fmt  Attr PSize PFree 
  /dev/loop0        lvm2 a-  2.00g    2.00g
  /dev/loop1 vgdata lvm2 a-  2.00g 1016.00m
  /dev/loop2 vgdata lvm2 a-  2.00g      0
  /dev/loop3 vgdata lvm2 a-  2.00g      0
</pre>


<tt>FEATURES="-auto-accounts"</tt> (<tt>auto-accounts</tt> would be enabled by default)
''/dev/loop0'' is now a standalone device detached from any VG. However it still contains some LVM metadata that remains to be wiped with the help of the '''pvremove''' command:


This is a change from GLEP 27 to get rid of ugly "no" prefix and to follow naming conventions for existing <tt>FEATURES</tt> settings.
{{fancywarning|pvremove/pvmove '''do not destroy the disk content'''. Please *do* a secure erase of the storage device with ''shred'' or any similar tool before disposing of it. }}


With <tt>auto-accounts</tt> disabled, Portage will do an initial check using libc (respecting <tt>/etc/nsswitch.conf</tt>) to see if all depended-upon users and groups exist. If they exist, the user/group dependency will be satisfied and <tt>ebuild</tt> can continue. If the dependencies are not satisfied, then the ebuild will abort with unsatisfied dependencies and display the users and groups that need to be created, and what their associated settings should be.
<pre>
# pvdisplay /dev/loop0
  "/dev/loop0" is a new physical volume of "2.00 GiB"
  --- NEW Physical volume ---
  PV Name              /dev/loop0
  VG Name             
  PV Size              2.00 GiB
  Allocatable          NO
  PE Size              0 
  Total PE              0
  Free PE              0
  Allocated PE          0
  PV UUID              b9i1Hi-llka-egCF-2vU2-f7tp-wBqh-qV4qEk


==== Allow User/Group Names to Be Specified At Build Time ====
# pvremove /dev/loop0
  Labels on physical volume "/dev/loop0" successfully wiped
# pvdisplay /dev/loop0
  No physical volume label read from /dev/loop0
  Failed to read physical volume "/dev/loop0"
</pre>


Some users may want an <tt>nginx</tt> user, while others may want a generic <tt>www</tt> user to be used.
Great! Things are just simple than that. In their day to day reality, system administrators drive their show in a extremely close similar manner: they do additional tasks like taking backups of data located on the LVs before doing any risky operation or plan applications shutdown periods prior starting a manipulation with a LVM volume to take extra precautions.


TBD.
== Replacing a PV (storage device) by another ==


==== Not Elegant for Specific Users/Groups ====
The principle a mix of what has been said in the above sections. The principle is basically:
# Create a new PV
# Associate it to the VG
# Move the contents of the PV to be removed on the remaining PVs composing the VG
# Remove the PV from the VG and wipe it


This implementation looks cool but is potentially annoying for specific users and groups. For example, for an <tt>nginx</tt> ebuild that needs an <tt>nginx</tt> user, it would need to be added to the system profile. We probably need to implement ebuild-local user/groups as well.
The strategy in this paragraph is to reuse ''/dev/loop0'' and make it replace ''/dev/loop2'' (both devices are of the same size, however we also could have used a bigger ''/dev/loop0'' as well).  


==== Specify Required Users and Groups for Profile ====
Here we go! First we need to (re-)create the LVM metadata to make ''/dev/loop0'' usable by LVM:


Some users and groups '''must''' be part of the system and should be in the system set. It would be nice to move some of this out of baselayout and into the profiles directly. Maybe a good solution is to have <tt>baselayout</tt> <tt>RDEPEND</tt> on these users and groups.
<pre>
# pvcreate /dev/loop0
  Physical volume "/dev/loop0" successfully created
</pre>


TBD.
Then this brand new PV is added to the VG ''vgdata'' thus increasing its size of 2 GB:


==== Dependency Prefix ====
<pre>
# vgextend vgdata  /dev/loop0
  Volume group "vgdata" successfully extended
# vgs
  VG    #PV #LV #SN Attr  VSize VFree
  vgdata  4  3  0 wz--n- 7.98g 2.99g
# pvs
  PV        VG    Fmt  Attr PSize PFree 
  /dev/loop0 vgdata lvm2 a-  2.00g    2.00g
  /dev/loop1 vgdata lvm2 a-  2.00g 1016.00m
  /dev/loop2 vgdata lvm2 a-  2.00g      0
  /dev/loop3 vgdata lvm2 a-  2.00g      0
</pre>


One possible area of improvement is with the <tt>user/</tt> and <tt>group/</tt> syntax itself, which could be changed slightly to indicate that we are depending on something other than a package. But this is not absolutely necessary and "user" and "group" could be treated as reserved names that cannot be used for categories, since they have a special meaning.
Now we have to suspend the allocation of PEs on ''/dev/loop2'' prior to moving its PEs (and freeing some space on it):


==== .tbz2 support ====
<pre>
# pvchange -x n /dev/loop2
  Physical volume "/dev/loop2" changed
  1 physical volume changed / 0 physical volumes not changed
# pvs
  PV        VG    Fmt  Attr PSize PFree 
  /dev/loop0 vgdata lvm2 a-  2.00g    2.00g
  /dev/loop1 vgdata lvm2 a-  2.00g 1016.00m
  /dev/loop2 vgdata lvm2 --  2.00g      0
  /dev/loop3 vgdata lvm2 a-  2.00g      0
</pre>


In general, the design proposed above  will work well for binary packages, as long as the users and groups required by the <tt>.tbz2</tt> can be found in the local Portage tree and overlays. If not, then Portage will not have any metadata relating to the user(s) or group(s) that need to be created for the <tt>.tbz2</tt> and will not be able to create them, resulting in an install failure, which of course is not optimal.
Then we move all of the the PEs on ''/dev/loop2'' to the rest of the VG:


Therefore, it may be necessary to embed user and group metadata within the <tt>.tbz2</tt> and have Portage use this data only if local user/group metadata for the requested users and groups is not available. In addition, this user/group metadata may need to be cached persistently inside <tt>/var/db/pkg</tt> or another location to ensure that it is continually available to the Portage UID/GID code. This could add a bit more complexity to the implementation but should solve the <tt>.tbz2</tt> failure problem. This would create three layers of user/group data:
<pre>
# pvmove /dev/loop2
  /dev/loop2: Moved: 49.9%
  /dev/loop2: Moved: 100.0%
# pvs
  PV        VG    Fmt  Attr PSize PFree 
  /dev/loop0 vgdata lvm2 a-  2.00g      0
  /dev/loop1 vgdata lvm2 a-  2.00g 1016.00m
  /dev/loop2 vgdata lvm2 --  2.00g    2.00g
  /dev/loop3 vgdata lvm2 a-  2.00g      0
</pre>


# Core user/group metadata defined in <tt>/usr/portage</tt>.
Then we remove ''/dev/loop2'' from the VG and we wipe its LVM metadata:
# Overlay user/group metadata defined in <tt>OVERLAY_DIR/profiles/{users,groups}</tt>
# Package user/group metadata


Using pseudo-code, we could imagine resolution of user and group metadata at <tt>.tbz2</tt> install time to look like this:
<pre>
# vgreduce vgdata /dev/loop2
  Removed "/dev/loop2" from volume group "vgdata"
# pvremove /dev/loop2
  Labels on physical volume "/dev/loop2" successfully wiped
</pre>


Final state of the PVs composing ''vgdata'':
<pre>
<pre>
all_ug_metadata = profile_ug_metadata + overlay_ug_metadata
# pvs
if (user_or_group in (all_ug_metadata)):
  PV        VG    Fmt  Attr PSize PFree 
    return all_ug_metadata[user_or_group]
  /dev/loop0 vgdata lvm2 a-  2.00g      0
else:
  /dev/loop1 vgdata lvm2 a-  2.00g 1016.00m
    return binary_package_ug_metadata[user_or_group]
  /dev/loop3 vgdata lvm2 a-  2.00g      0
</pre>
</pre>
==== Compatibility with other distributions ====
 
If our goal is to ensure a sane method of creating UID/GID's in packages, we should also look at making them compatible with the wider world.  The LSB http://refspecs.freestandards.org/LSB_3.0.0/LSB-Core-generic/LSB-Core-generic/usernames.html specifies very lax standards for system accounts.  Seemingly there are no hard standards for system/daemon UID/GID's, and no real desire in the community from anyone I discussed this issue with to standardize.  There is one important issue to note, and that is the lowest user account number.
''/dev/loop0'' took the place of ''/dev/loop2'' :-)
* Fedora/RHEL:  Presently RHEL starts assigning UID/GID's to users of the system at 500 and moves up, this will changehttp://lists.fedoraproject.org/pipermail/devel/2011-May/151663.html to number after 1000
 
* Debian/Ubuntu: Presently Debian starts assigning UID/GID's to users of the system at 1000, and moves up.  This appears to be the standard distributions are moving towards
= More advanced topics =
* Gentoo/Funtoo: Presently Funtoo and Gentoo are both compliant with Debian, and after Fedora 16, and the subsequent RHEL, this will be a standard across most major linux distributions.
 
== Backing up the layout ==
 
== Freezing a VG ==
 
== LVM snapshots ==
 
== Linear/Stripped/Mirrored Logical volumes ==
 
= LVM and Funtoo =
 
 
[[Category:Labs]]
[[Category:Filesystems]]
[[Category:Articles]]

Revision as of 05:10, May 14, 2013

Introduction

LVM (Logical Volume Management) offers a great flexibility in managing your storage and significantly reduces server downtimes by allowing on-line disk space management: The great idea beneath LVM is to make the data and its storage loosely coupled through several layers of abstraction. You (the system administrator) have the hand of each of those layers making the entire space management process extremely simple and flexible though various set of coherent commands.

Several other well-known binary Linux distributions makes an aggressive use of LVM and several Unixes including HP-UX, AIX and Solaris offers since a while a similar functionality modulo the commands to be used. LVM is not mandatory but its usage can bring you additional flexibility and make your everyday life much more simpler.

Concepts

As usual, having a good idea of the concepts lying beneath is mandatory. LVM is not very complicated, but it is easy to become confused, especially because it is a multi-layered system; however LVM designers had the good idea of keeping the command names consistent between all LVM command sets, making your life easier.

LVM consists of, mainly, three things:

  • Physical volumes (or PV): nothing more than a physical storage space. A physical volume can by anything like a partition on a local hard disk, a partition located on a remote SAN disk, a USB key or whatever else that could offer a storage space (so yes, technically it could be possible to use an optical storage device accessed in packet writing mode). The storage space on a physical volumes is divided (and managed) in small units called Physical Extents (or PE). Just to give an analogy if you are a bit familiar with RAID, PE are a bit like RAID stripes.
  • Volume Groups (or VG): a group of at least one PV. VG are named entities and will appear in the system via the device mapper as /dev/volume-group-name.
  • Logical Volumes (or LV): a named division of a volume group in which a filesystem is created and that can be mounted in the VFS. Just for the record, just as for the PE in PV, a LV is managed as chucks known as Logical Extents (or LE). Most of the time those LE are hidden to the system administrator due to a 1:1 mapping between them and the PE lying be just beneath but a cool fact to know about LEs is that they can be spread over PV just like RAID stripes in a RAID-0 volume. However, researches done on the Web tends to demonstrate system administrators prefer to build RAID volumes with mdadm than use LVM over them for performance reasons.

In short words: LVM logical volumes (LV) are containers that can hold a single filesystem and which are created inside a volume group (VG) itself composed by an aggregation of at least one physical volumes (PV) themselves stored on various media (usb key, harddisk partition and so on). The data is stored in chunks spread over the various PV.

   Note

Retain what PV, VG and LV means as we will use those abbreviations in the rest of this article.

Your first tour of LVM

Physical volumes creation

   Note

We give the same size to all volumes for the sake of the demonstration. This is not mandatory and be possible to have mixed sizes PV inside a same VG.

To start with, just create three raw disk images:

# dd if=/dev/zero of=/tmp/hdd1.img bs=2G count=1
# dd if=/dev/zero of=/tmp/hdd2.img bs=2G count=1
# dd if=/dev/zero of=/tmp/hdd3.img bs=2G count=1

and associate them to a loopback device:

# losetup -f
/dev/loop0 
# losetup /dev/loop0 /tmp/hdd1.img
# losetup /dev/loop1 /tmp/hdd2.img
# losetup /dev/loop2 /tmp/hdd3.img

Okay nothing really exciting there, but wait the fun is coming! First check that sys-fs/lvm2 is present on your system and emerge it if not. At this point, we must tell you a secret: although several articles and authors uses the taxonomy "LVM" it denotes "LVM version 2" or "LVM 2" nowadays. You must know that LVM had, in the old good times (RHEL 3.x and earlier), a previous revision known as "LVM version 1". LVM 1 is now considered as an extincted specie and is not compatible with LVM 2, although LVM 2 tools maintain a backward compatibility.

The very frst step in LVM is to create the physical devices or PV. "Wait create what?! Aren't the loopback devices present on the system?" Yes they are present but they are empty, we must initialize them some metadata to make them usable by LVM. This is simply done by:

# pvcreate /dev/loop0
  Physical volume "/dev/loop0" successfully created
# pvcreate /dev/loop1
  Physical volume "/dev/loop1" successfully created
# pvcreate /dev/loop2
  Physical volume "/dev/loop2" successfully created

It is absolutely normal that nothing in particular is printed at the output of each command but we assure you: you have three LVM PVs. You can check them by issuing:

# pvs
  PV         VG   Fmt  Attr PSize PFree
  /dev/loop0      lvm2 a-   2.00g 2.00g
  /dev/loop1      lvm2 a-   2.00g 2.00g
  /dev/loop2      lvm2 a-   2.00g 2.00g


Some good information there:

  • PV: indicates the physical path the PV lies on
  • VG indicates the VG the PV belongs to. At this time, we didn't created any VG yet and the column remains empty.
  • Fmt: indicates the format of the PV (here it says we have a LVM version 2 PV)
  • Attrs: indicates some status information, the 'a' here just says that the PV is accessible.
  • PSize and PFree: indicates the PV size and the amount of remaining space for this PV. Here we have three empty PV so it bascially says "2 gigabytes large, 2 out of gigabytes free"

It is now time to introduce you to another command: pvdisplay. Just run it without any arguments:

 pvdisplay
  "/dev/loop0" is a new physical volume of "2.00 GiB"
  --- NEW Physical volume ---
  PV Name               /dev/loop0
  VG Name               
  PV Size               2.00 GiB
  Allocatable           NO
  PE Size               0   
  Total PE              0
  Free PE               0
  Allocated PE          0
  PV UUID               b9i1Hi-llka-egCF-2vU2-f7tp-wBqh-qV4qEk
   
  "/dev/loop1" is a new physical volume of "2.00 GiB"
  --- NEW Physical volume ---
  PV Name               /dev/loop1
  VG Name               
  PV Size               2.00 GiB
  Allocatable           NO
  PE Size               0   
  Total PE              0
  Free PE               0
  Allocated PE          0
  PV UUID               i3mdBO-9WIc-EO2y-NqRr-z5Oa-ItLS-jbjq0E
   
  "/dev/loop2" is a new physical volume of "2.00 GiB"
  --- NEW Physical volume ---
  PV Name               /dev/loop2
  VG Name               
  PV Size               2.00 GiB
  Allocatable           NO
  PE Size               0   
  Total PE              0
  Free PE               0
  Allocated PE          0
  PV UUID               dEwVuO-a5vQ-ipcH-Rvlt-5zWt-iAB2-2F0XBf

The third three lines of each PV shows:

  • what is the storage device beneath a PV
  • the VG it is tied to
  • the size of this PV.

Allocatable indicates whether the PV is used to store data. As the PV is not a member of a VG, it cannot not be used (yet) hence the "NO" shown. Another set of information is the lines starting with PE. PE stands for Physical Extents (data stripe) and is the finest granularity LVM can manipulate. The size of a PE is "0" here because we have a blank PV however it typically holds 32 MB of data. Following PE Size are Total PE which show the the total number of PE available on this PV and Free PE the number of PE remaining available for use. Allocated PE just show the difference between Total PE and Free PE.

The latest line (PV UUID) is a unique identifier used internally by LVM to name the PV. You have to know that it exists because it is sometimes useful when having to recover from corruption or do weird things with PV however most of the time you don't have to worry about its existence.

   Note

It is possible to force how LVM handles the alignments on the physical storage. This is useful when dealing with 4K sectors drives that lies on their physical sectors size. Refer to the manual page.

Volume group creation

We have the blank PV at this time but to make them a bit more usable for storage we must tell to LVM how they are grouped to form a VG (storage pool) where LV will be created. A nice aspect of VGs resides in the fact that they are not "written in the stone" once created: you can still add, remove or exchange PV (in the case the device the PV is stored on fails for example) inside a VG at a later time. To create our first volume group named vgtest:

# vgcreate vgtest /dev/loop0 /dev/loop1 /dev/loop2
  Volume group "vgtest" successfully created

Just like we did before with PV, we can get a list of what are the VG known by the system. This is done through the command vgs:

# vgs
  VG     #PV #LV #SN Attr   VSize VFree
  vgtest   3   0   0 wz--n- 5.99g 5.99g

vgs show you a tabluar view of information:

  • VG: the name of the VG
  • #PV: the number of PV composing the VG
  • #LV: the number of logical volumes (LV) located inside the VG
  • Attrs: a status field. w, z and n here means that VG is:
    • w: writable
    • z: resizable
    • n: using the allocation policy normal (tweaking allocation policies is beyond the scope of this article, we will use the default value normal in the rest of this article)
  • VSize and VFree gives statistics on how full a VG is versus its size

Note the dashes in Attrs, they mean that the attribute is not active:

  • First dash (3rd position) indicates if the VG would have been exported (a 'x' would have been showed at this position in that case).
  • Second dash (4th position) indicates if the VG would have been partial (a 'p' would have been showed at this position in that case).
  • Third dash (rightmost position) indicates if the VG is a clustered (a 'c' would have been showed at this position in that case).

Exporting a VG and clustered VG are a bit more advanced aspects of LVM and won't be covered here especially the clustered VGs which are used in the case of a shared storage space used in a cluster of machines. Talking about clustered VGs management in particular would require and entire article in itself. For now the only detail you have to worry about those dashes in Attrs is to see a dash at the 4th position of Attrs instead of a p. Seeing p there would be a bad news: the VG would have missing parts (PV) making it not usable.

   Note

In the exact same manner you can see a detailed information about physical volumes with pvdisplay, you can see detailed information of a volume group with vgdisplay. We will demonstrate that latter command in the paragraphs to follow.

Before leaving the volume group aspect, do you remember the pvs command shown in the previous paragraphs? Try it gain:

# pvs
  PV         VG     Fmt  Attr PSize PFree
  /dev/loop0 vgtest lvm2 a-   2.00g 2.00g
  /dev/loop1 vgtest lvm2 a-   2.00g 2.00g
  /dev/loop2 vgtest lvm2 a-   2.00g 2.00g

Now it shows the VG our PVs belong to :-)

Logical volumes creation

Now the final steps: we will create the storage areas (logical volumes or LV) inside the VG where we will then create filesystems on. Just like a VG has a name, a LV has also a name which is unique in the VG.

   Note

Two LV can be given the same name as long as they are located on a different VG.

To divide our VG like below:

  • lvdata1: 2 GB
  • lvdata2: 1 GB
  • lvdata3 : 10% of the VG size
  • lvdata4 : All of remaining free space in the VG

We use the following commands (notice the capital 'L' and the small 'l' to declare absolute or relative sizes):

# lvcreate -n lvdata1 -L 2GB vgtest
  Logical volume "lvdata1" created
#  lvcreate -n lvdata2 -L 1GB vgtest
  Logical volume "lvdata2" created
# lvcreate -n lvdata3 -l 10%VG vgtest
  Logical volume "lvdata2" created

What is going on so far? Let's check with the pvs/vgs counterpart known as lvs:

# lvs
  LV      VG     Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  lvdata1 vgtest -wi-a-   2.00g                                      
  lvdata2 vgtest -wi-a-   1.00g                                      
  lvdata3 vgtest -wi-a- 612.00m
# 

Notice the size of lvdata3, it is roughly 600MB (10% of 6GB). How much free space remains in the VG? Time to see what vgs and vgdisplay returns:

# vgs
  VG     #PV #LV #SN Attr   VSize VFree
  vgtest   3   3   0 wz--n- 5.99g 2.39g
# vgdisplay 
  --- Volume group ---
  VG Name               vgtest
  System ID             
  Format                lvm2
  Metadata Areas        3
  Metadata Sequence No  4
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                3
  Open LV               0
  Max PV                0
  Cur PV                3
  Act PV                3
  VG Size               5.99 GiB
  PE Size               4.00 MiB
  Total PE              1533
  Alloc PE / Size       921 / 3.60 GiB
  Free  PE / Size       612 / 2.39 GiB
  VG UUID               baM3vr-G0kh-PXHy-Z6Dj-bMQQ-KK6R-ewMac2

Basically it say we have 1533 PE (chunks) available for a total size of 5.99 GiB. On those 1533, 921 are used (for a size of 3.60 GiB) and 612 remains free (for a size of 2.39 GiB). So we expect to see lvdata4 having an approximative size of 2.4 GiB. Before creating it, have a look at some statistics at the PV level:

# pvs
  PV         VG     Fmt  Attr PSize PFree  
  /dev/loop0 vgtest lvm2 a-   2.00g      0 
  /dev/loop1 vgtest lvm2 a-   2.00g 404.00m
  /dev/loop2 vgtest lvm2 a-   2.00g   2.00g

# pvdisplay
  --- Physical volume ---
  PV Name               /dev/loop0
  VG Name               vgtest
  PV Size               2.00 GiB / not usable 4.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              511
  Free PE               0
  Allocated PE          511
  PV UUID               b9i1Hi-llka-egCF-2vU2-f7tp-wBqh-qV4qEk
   
  --- Physical volume ---
  PV Name               /dev/loop1
  VG Name               vgtest
  PV Size               2.00 GiB / not usable 4.00 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              511
  Free PE               101
  Allocated PE          410
  PV UUID               i3mdBO-9WIc-EO2y-NqRr-z5Oa-ItLS-jbjq0E
   
  --- Physical volume ---
  PV Name               /dev/loop2
  VG Name               vgtest
  PV Size               2.00 GiB / not usable 4.00 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              511
  Free PE               511
  Allocated PE          0
  PV UUID               dEwVuO-a5vQ-ipcH-Rvlt-5zWt-iAB2-2F0XBf

Quite interesting! Did you notice? The first PV is full, the second is more or less full and the third is empty. This is due to the allocation policy used for the VG: it fills its first PV then its second PV and then its third PV (this, by the way, gives you a chance to recover from a dead physical storage if by luck none of your PE was present on it).

It is now time to create our last LV, again notice the small 'l' to specify a relative size:

# lvcreate -n lvdata4 -l 100%FREE vgtest
  Logical volume "lvdata4" created
# lvs
  LV      VG     Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  lvdata1 vgtest -wi-a-   2.00g                                      
  lvdata2 vgtest -wi-a-   1.00g                                      
  lvdata3 vgtest -wi-a- 612.00m                                      
  lvdata4 vgtest -wi-a-   2.39g

Now the $100 question: if pvdisplay and vgdisplay commands exist, does command named lvdisplay exist as well? Yes absolutely! Indeed the command sets are coherent between abstraction levels (PV/VG/LV) and they are named in the exact same manner modulo their first 2 letters:

  • PV: pvs/pvdisplay/pvchange....
  • VG: vgs/vgdisplay/vgchange....
  • LG: lvs/lvdisplay/lvchange....

Back to our lvdisplay command, here is how it shows up:

# lvdisplay 
  --- Logical volume ---
  LV Name                /dev/vgtest/lvdata1
  VG Name                vgtest
  LV UUID                fT22is-cmSL-uhwM-zwCd-jeIe-DWO7-Hkj4k3
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                2.00 GiB
  Current LE             512
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0
   
  --- Logical volume ---
  LV Name                /dev/vgtest/lvdata2
  VG Name                vgtest
  LV UUID                yd07wA-hj77-rOth-vxW8-rwo9-AX7q-lcyb3p
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                1.00 GiB
  Current LE             256
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:1
   
  --- Logical volume ---
  LV Name                /dev/vgtest/lvdata3
  VG Name                vgtest
  LV UUID                ocMCL2-nkcQ-Fwdx-pss4-qeSm-NtqU-J7vAXG
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                612.00 MiB
  Current LE             153
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2
   
  --- Logical volume ---
  LV Name                /dev/vgtest/lvdata4
  VG Name                vgtest
  LV UUID                iQ2rV7-8Em8-85ts-anan-PePb-gk18-A31bP6
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                2.39 GiB
  Current LE             612
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:3

Nothing extremely useful to comment for an overview beyond showing at the exception of two things:

  1. LVs are accessed via the device mapper (see the lines starting by LV Name and notice how the name is composed). So lvdata1 will be accessed via /dev/vgtest/lvdata1, lvdata2 will be accessed via /dev/vgtest/lvdata2 and so on.
  2. just like PV are managed in sets of data chunks (the so famous Physical Extents or PEs), LVs are managed in a set of data chunks known as Logical Extents or LEs. Most of the time you don't have to worry about the existence of LEs because they fits withing a single PE although it is possible to make them smaller hence having several LE within a single PE. Demonstration: if you consider the first LV, lvdisplay says it has a size of 2 GiB and holds 512 logical extents. Dividing 2GiB by 512 gives 4 MiB as the size of a LE which is the exact same size used for PEs as seen when demonstrating the pvdisplay command some paragraphs above. So in our case we have a 1:1 match between a LE and the underlying PE.

Oh another great point to underline: you can display the PV in relation with a LV :-) Just give a special option to lvdisplay:

# lvdisplay -m
  --- Logical volume ---
  LV Name                /dev/vgtest/lvdata1
  VG Name                vgtest
  (...)
  Current LE             512
  Segments               2
  (...)
  --- Segments ---
  Logical extent 0 to 510:
    Type                linear
    Physical volume     /dev/loop0
    Physical extents    0 to 510
   
  Logical extent 511 to 511:
    Type                linear
    Physical volume     /dev/loop1
    Physical extents    0 to 0
   
   
  --- Logical volume ---
  LV Name                /dev/vgtest/lvdata2
  VG Name                vgtest
  (...)
  Current LE             256
  Segments               1
  (...)
   
  --- Segments ---
  Logical extent 0 to 255:
    Type                linear
    Physical volume     /dev/loop1
    Physical extents    1 to 256
   
   
  --- Logical volume ---
  LV Name                /dev/vgtest/lvdata3
  VG Name                vgtest
  (...)
  Current LE             153
  Segments               1
  (...)
   
  --- Segments ---
  Logical extent 0 to 152:
    Type                linear
    Physical volume     /dev/loop1
    Physical extents    257 to 409
   
   
  --- Logical volume ---
  LV Name                /dev/vgtest/lvdata4
  VG Name                vgtest
  (...)
  Current LE             612
  Segments               2
  (...)
   
  --- Segments ---
  Logical extent 0 to 510:
    Type                linear
    Physical volume     /dev/loop2
    Physical extents    0 to 510
   
  Logical extent 511 to 611:
    Type                linear
    Physical volume     /dev/loop1
    Physical extents    410 to 510

To go one step further let's analyze a bit how the PE are used: the first LV has 512 LEs (remember: one LE fits within one PE here so 1 LE = 1 PE). Amongst those 512 LEs, 511 of them (0 to 510) are stored on /dev/loop0 and the 512th LE is on /dev/loop1. Huh? Something seems to be wrong here, pvdisplay said that /dev/loop0 was holding 512 PV so why an extent has been placed on the second storage device? Indeed its not a misbehaviour and absolutely normal: LVM uses some metadata internally with regards the PV, VG and LV thus making some of storage space unavailable for the payload. This explains why 1 PE has been "eaten" to store that metadata. Also notice the linear allocation process: /dev/loop0 has been used, then when being full /dev/loop1 has also been used then the turn of /dev/loop2 came.

Now everything is in place, if you want just check again with vgs/pvs/vgdisplay/pvdisplay and will notice that the VG is now 100% full and all of the underlying PV are also 100% full.

Filesystems creation and mounting

Now we have our LVs it could be fun if we could do something useful with them. In the case you missed it, LVs are accessed via the device mapper which uses a combination of the VG and LV names thus:

  • lvdata1 is accessible via /dev/vgtest/lvdata1
  • lvdata2 is accessible via /dev/vgtest/lvdata2
  • and so on!

Just like any traditional storage device, the newly created LVs are seen as block devices as well just as if they were a kind of harddisk (don't worry about the "dm-..", it is just an internal block device automatically allocated by the device mapper for you):

# ls -l /dev/vgtest
total 0
lrwxrwxrwx 1 root root 7 Dec 27 12:54 lvdata1 -> ../dm-0
lrwxrwxrwx 1 root root 7 Dec 27 12:54 lvdata2 -> ../dm-1
lrwxrwxrwx 1 root root 7 Dec 27 12:54 lvdata3 -> ../dm-2
lrwxrwxrwx 1 root root 7 Dec 27 12:54 lvdata4 -> ../dm-3

# ls -l /dev/dm-[0-3]
brw-rw---- 1 root disk 253, 0 Dec 27 12:54 /dev/dm-0
brw-rw---- 1 root disk 253, 1 Dec 27 12:54 /dev/dm-1
brw-rw---- 1 root disk 253, 2 Dec 27 12:54 /dev/dm-2
brw-rw---- 1 root disk 253, 3 Dec 27 12:54 /dev/dm-3

So if LVs are block device a filesystem can be created on them just like if they were a real harddisk or hardisk partitions? Absolutely! Now let's create ext4 filesystems on our LVs:

# mkfs.ext4 /dev/vgtest/lvdata1

mke2fs 1.42 (29-Nov-2011)
Discarding device blocks: done                            
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
131072 inodes, 524288 blocks
26214 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=536870912
16 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done

# mkfs.ext4 /dev/vgtest/lvdata1
(...)
# mkfs.ext4 /dev/vgtest/lvdata2
(...)
# mkfs.ext4 /dev/vgtest/lvdata3
(..)

Once the creation ended we must create the mount points and mount the newly created filesystems on them:

# mkdir /mnt/data-01
# mkdir /mnt/data-02
# mkdir /mnt/data-03
# mkdir /mnt/data-04
# mount /dev/vgtest/lvdata1 /mnt/data01
# mount /dev/vgtest/lvdata2 /mnt/data02
# mount /dev/vgtest/lvdata3 /mnt/data03
# mount /dev/vgtest/lvdata4 /mnt/data04

Finally we can check that everything is in order:

# df -h
Filesystem                    Size  Used Avail Use% Mounted on
(...)
/dev/mapper/vgtest-lvdata1    2.0G   96M  1.9G   5% /mnt/data01
/dev/mapper/vgtest-lvdata2   1022M   47M  924M   5% /mnt/data02
/dev/mapper/vgtest-lvdata3    611M   25M  556M   5% /mnt/data03
/dev/mapper/vgtest-lvdata4    2.4G  100M  2.2G   5% /mnt/data04

Did you notice the device has changed? Indeed everything is in order, mount just uses another set of symlinks which point to the exact same block devices:

# ls -l /dev/mapper/vgtest-lvdata[1-4]
lrwxrwxrwx 1 root root 7 Dec 28 20:12 /dev/mapper/vgtest-lvdata1 -> ../dm-0
lrwxrwxrwx 1 root root 7 Dec 28 20:13 /dev/mapper/vgtest-lvdata2 -> ../dm-1
lrwxrwxrwx 1 root root 7 Dec 28 20:13 /dev/mapper/vgtest-lvdata3 -> ../dm-2
lrwxrwxrwx 1 root root 7 Dec 28 20:13 /dev/mapper/vgtest-lvdata4 -> ../dm-3

Renaming a volume group and its logical volumes

So far we have four LVs named lvdata1 to lvdata4 mounted on /mnt/data01 to /mnt/data04. It would be more adequate to :

  1. make the number in our LV names being like "01" instead of "1"
  2. rename our volume groupe to "vgdata" instead of "vgtest"

To show how dynamic is the LVM world, we will rename our VG and LV on the fly using two commands: vgrename for acting at the VG level and its counterpart lvrename to act at the LV level. Starting by the VG or the LVs makes strictly no difference, you can start either way and get the same result. In our example we have chosen to start with the VG:

# vgrename vgtest vgdata
  Volume group "vgtest" successfully renamed to "vgdata"
# lvrename vgdata/lvdata1 vgdata/lvdata01
  Renamed "lvdata1" to "lvdata01" in volume group "vgdata"
# lvrename vgdata/lvdata2 vgdata/lvdata02
  Renamed "lvdata2" to "lvdata02" in volume group "vgdata"
# lvrename vgdata/lvdata3 vgdata/lvdata03
  Renamed "lvdata3" to "lvdata03" in volume group "vgdata"
# lvrename vgdata/lvdata4 vgdata/lvdata04
  Renamed "lvdata4" to "lvdata04" in volume group "vgdata"

What happened? Simple:

# vgs
  VG     #PV #LV #SN Attr   VSize VFree
  vgdata   3   4   0 wz--n- 5.99g    0 
# lvs
  LV       VG     Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  lvdata01 vgdata -wi-ao   2.00g                                      
  lvdata02 vgdata -wi-ao   1.00g                                      
  lvdata03 vgdata -wi-ao 612.00m                                      
  lvdata04 vgdata -wi-ao   2.39g

Sounds good, our VG and LVs have been renamed! What a command like mount will say?

# mount
(...)
/dev/mapper/vgtest-lvdata1 on /mnt/data01 type ext4 (rw)
/dev/mapper/vgtest-lvdata2 on /mnt/data02 type ext4 (rw)
/dev/mapper/vgtest-lvdata3 on /mnt/data03 type ext4 (rw)
/dev/mapper/vgtest-lvdata4 on /mnt/data04 type ext4 (rw)

Ooops... It is not exactly a bug, mount still shows the symlinks used at the time the LVs were mounted in the VFS and has not updated its information. However once again everything is correct because the underlying block devices (/dev/dm-0 to /dev/dm-3) did not changed at all. To see the right information the LVs must be unmounted and mounted again:

# umount /mnt/data01
(...)
# umount /mnt/data04
# mount /dev/vgdata/lvdata01 /mnt/data01 
(...)
# mount /dev/vgdata/lvdata04 /mnt/data04
# mount
/dev/mapper/vgdata-lvdata01 on /mnt/data01 type ext4 (rw)
/dev/mapper/vgdata-lvdata02 on /mnt/data02 type ext4 (rw)
/dev/mapper/vgdata-lvdata03 on /mnt/data03 type ext4 (rw)
/dev/mapper/vgdata-lvdata04 on /mnt/data04 type ext4 (rw)
   Note

Using /dev/volumegroup/logicalvolume or /dev/volumegroup-logicalvolume makes no difference at all, those are two sets of symlinks pointing on the exact same block device.

Expanding and shrinking the storage space

Did you notice in the previous section we have never talked on topic like "create this partition at the beginning" or "allocate 10 sectors more". In LVM you do not have to worry about that kind of problematics: your only concern is more "Do I have the space to allocate a new LV or how can I extend an existing LV?". LVM takes cares of the low levels aspects for you, just focus on what you want to do with your storage space.

The most common problem with computers is the shortage of space on a volume, most of the time production servers can run months or years without requiring a reboot for various reasons (kernel upgrade, hardware failure...) however they regularly requires to extend their storage space because we do generate more and more data as the time goes. With "traditional" approach like fiddling directly with hard drives partitions, storage space manipulation can easily become a headache mainly because it requires coherent copy to be made and thus application downtimes. Don't expect the situation to be more enjoyable with a SAN storage rather a directly attached storage device... Basically the problems remains the same.

Expanding a storage space

The most common task for a system administrator is to expand the available storage space. In the LVM world this implies:

  • Creating a new PV
  • Adding the PV to the VG (thus extending the VG capacity)
  • Extending the existing LVs or create new ones
  • Extending the structures of the filesystems located on a LV in the case a LV is extended (Not all of the filesystems around support that capability).

Bringing a new PV in the VG

In the exact same manner we have created our first PV let's create our additional storage device, associate it to a loopback device and then create a PV on it:

# dd if=/dev/zero of=/tmp/hdd4.img bs=2G count=1
# losetup /dev/loop3 /tmp/hdd4.img
# pvcreate /dev/loop3

A pvs should report the new PV with 2 GB of free space:

# pvs
  PV         VG     Fmt  Attr PSize PFree
  /dev/loop0 vgdata lvm2 a-   2.00g    0 
  /dev/loop1 vgdata lvm2 a-   2.00g    0 
  /dev/loop2 vgdata lvm2 a-   2.00g    0 
  /dev/loop3        lvm2 a-   2.00g 2.00g

Excellent! The next step consist of adding this newly created PV inside our VG vgdata, this is where the vgextend command comes at our rescue:

# vgextend vgdata /dev/loop3
  Volume group "vgdata" successfully extended
# vgs
  VG     #PV #LV #SN Attr   VSize VFree
  vgdata   4   4   0 wz--n- 7.98g 2.00g

Great, vgdata is now 8 GB large instead of 6 GB and have 2 GB of free space to allocate to either new LVs either existing LVs.

Extending the LV and its filesystem

Bringing new LV would demonstrate nothing more nevertheless extending our existing LVs is much more interesting. How can we use our 2GB extra free space? We can, for example, split it in two allocating a 50% to our first (lvdata01) and third (lvdata03) LV adding 1GB of space to both. The best of the story is that operation is very simple and is realized with a command named lvextend:

# lvextend vgdata/lvdata01 -l +50%FREE 
  Extending logical volume lvdata01 to 3.00 GiB
  Logical volume lvdata01 successfully resized
# lvextend vgdata/lvdata03 -l +50%FREE
  Extending logical volume lvdata03 to 1.10 GiB
  Logical volume lvdata03 successfully resized

Ouaps!! We did a mistake there: lvdata01 has the expected size (2GB + 1GB for a grand total of 3 GB) but lvdata03 only grown of 512 MB (for a grand total size of 1.1 GB). Our mistake was obvious: once the first gigabyte (50% of 2GB) of extra space has been given to lvdata01, only one gigabyte remained free on the VG thus when we said "allocate 50% of the remaining gigabyte to lvdata03" LVM added only 512 MB leaving the other half of this gigabyte unused. The vgs command can confirm this:

# vgs
  VG     #PV #LV #SN Attr   VSize VFree  
  vgdata   4   4   0 wz--n- 7.98g 512.00m

Nevermind about that voluntary mistake we will keep that extra space for a later paragraph :-) What happened to the storage space visible from the operating system?

# df -h | grep lvdata01
/dev/mapper/vgdata-lvdata01   2.0G   96M  1.9G   5% /mnt/data01

Obviously resizing a LV does not "automagically" resize the filesystem structures to take into account the new LV size making that step part of our duty. Happily for us, ext3 can be resized and better it can be grown when mounted in the VFS. This is known as online resizing and a few others filesystems supports that capability, among them we can quote ext2 (ext3 without a journal), ext4 (patches integrated very recently as of Nov/Dec 2011), XFS, ResiserFS and BTRFS. To our knowledge, only BTRFS support both online resizing and online shrinking as of Decembrer 2011, all of the others require a filesystem to be unmounted first before being shrunk.

   Note

Consider using the option -r when invoking lvextend, it asks the command to perform a filesystem resize.

Now let's extend (grow) the ext3 filesystem located on lvdata01. As said above, ext3 support online resizing hence we do not need to kick it out of the VFS first:

# resize2fs /dev/vgdata/lvdata01
resize2fs 1.42 (29-Nov-2011)
Filesystem at /dev/vgdata/lvdata01 is mounted on /mnt/data01; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 1
Performing an on-line resize of /dev/vgdata/lvdata01 to 785408 (4k) blocks.
The filesystem on /dev/vgdata/lvdata01 is now 785408 blocks long.

# df -h | grep lvdata01
/dev/mapper/vgdata-lvdata01   3.0G   96M  2.8G   4% /mnt/data01

Et voila! Our LV has now plenty of new space usable :-) We do not bother about how the storage is organized by LVM amongst the underlying storage devices and it is not our problem after all. We only worry about having our storage requirements being satisfied without any further details. From our point of view everything is seen just as if we were manipulating a single storage device subdivided in several partitions of a dynamic size and always organized in a set of contiguous blocks.

Now let's shuffle the cards a bit more: when we examined how the LEs of our LVs were allocated, we saw that lvdata01 (named lvdata1 at this time) consisted of 512 LEs or 512 PEs (because of the 1:1 mapping between those) spread over two PVs. As we have extended it to use an additional PV, we should see it using 3 segments:

  • Segment 1: located on the PV stored on /dev/loop0 (LE/PE #0 to #510)
  • Segment 2: located on the PV stored on /dev/loop1 (LE/PE #511)
  • Segment 3: located on the PV stored on /dev/loop1 (LE/PE #512 and followers)

Is it the case? Let's check:

# lvdisplay -m  vgdata/lvdata01
  --- Logical volume ---
  LV Name                /dev/vgdata/lvdata01
  VG Name                vgdata
  LV UUID                fT22is-cmSL-uhwM-zwCd-jeIe-DWO7-Hkj4k3
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                3.00 GiB
  Current LE             767
  Segments               3
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0
   
  --- Segments ---
  Logical extent 0 to 510:
    Type                linear
    Physical volume     /dev/loop0
    Physical extents    0 to 510
   
  Logical extent 511 to 511:
    Type                linear
    Physical volume     /dev/loop1
    Physical extents    0 to 0
   
  Logical extent 512 to 766:
    Type                linear
    Physical volume     /dev/loop3
    Physical extents    0 to 254

Bingo! Note that if it is true here (LVM uses linear allocation) would not be true in the general case.

   Warning

Never mix a local storage device with a SAN disk within the same volume group and especially if that later is your system volume. It will bring you a lot of troubles if the SAN disk goes offline or bring weird performance fluctuations as PEs allocated on the SAN will get faster response times than those located on a local disk.

Shrinking a storage space

On some occasions it can be useful to reduce the size of a LV or the size of the VG itself. The principle is similar to what has been demonstrated in the previous section:

  1. umount the filesystem belong to the LV to be processed (if your filesystem does not support online shrinking)
  2. reduce the filesystem size (if the LV is not to be flushed)
  3. reduce the LV size - OR - remove the LV
  4. remove a PV from the volume group if no longer used to store extents

The simplest case to start with is how a LV can be removed: a good candidate for removal is lvdata03, we failed to resize it and the better would be to scrap it. First unmount it:

# lvs
  LV       VG     Attr   LSize Origin Snap%  Move Log Copy%  Convert
  lvdata01 vgdata -wi-ao 3.00g                                      
  lvdata02 vgdata -wi-ao 1.00g                                      
  lvdata03 vgdata -wi-ao 1.10g                                      
  lvdata04 vgdata -wi-ao 2.39g                                      
# umount /dev/vgdata/lvdata03
# lvs
  LV       VG     Attr   LSize Origin Snap%  Move Log Copy%  Convert
  lvdata01 vgdata -wi-ao 3.00g                                      
  lvdata02 vgdata -wi-ao 1.00g                                      
  lvdata03 vgdata -wi-a- 1.10g                                      
  lvdata04 vgdata -wi-ao 2.39g

Noticed the little change with lvs? It lies in the Attr field: once the lvdata03 has been unmounted, lvs tells us the LV is not opened anymore (the little o at the rightmost position has been replaced by a dash). The LV still exists but nothing is using it.

To remove lvdata03 use the command lvremove and confirm the removal by entering 'y' when asked:

# lvremove vgdata/lvdata03
Do you really want to remove active logical volume lvdata03? [y/n]: y
  Logical volume "lvdata03" successfully removed
# lvs
  LV       VG     Attr   LSize Origin Snap%  Move Log Copy%  Convert
  lvdata01 vgdata -wi-ao 3.00g                                      
  lvdata02 vgdata -wi-ao 1.00g                                      
  lvdata04 vgdata -wi-ao 2.39g
# vgs
  VG     #PV #LV #SN Attr   VSize VFree
  vgdata   4   3   0 wz--n- 7.98g 1.60g

Notice the 1.60 of space has been freed in the VG. What can we do next? Shrinking lvdata04 by 50% giving roughly 1.2GB or 1228MB (1.2*1024) of its size could be a good idea so here we go. First we need to umount the filesystem from the VFS because ext3 does not support online shrinking.

# umount /dev/vgdata/lvdata04
# e2fsck -f /dev/vgdata/lvdata04
e2fsck 1.42 (29-Nov-2011)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/vgdata/lvdata04: 11/156800 files (0.0% non-contiguous), 27154/626688 blocks
# resize2fs -p /dev/vgdata/lvdata04 -L 1228M
# lvreduce /dev/vgdata/lvdata04 -L 1228
  WARNING: Reducing active logical volume to 1.20 GiB
  THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce lvdata04? [y/n]: y
  Reducing logical volume lvdata04 to 1.20 GiB
  Logical volume lvdata04 successfully resized
oxygen ~ # e2fsck -f /dev/vgdata/lvdata04 
e2fsck 1.42 (29-Nov-2011)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/vgdata/lvdata04: 11/78400 files (0.0% non-contiguous), 22234/314368 blocks

Not very practical indeed, we can tell lvreduce to handle the underlying filesystem shrinkage for us. Let's shrink again this time giving a 1 GB volume (1024 MB) in absolute size:

# lvreduce /dev/vgdata/lvdata04 -r -L 1024
fsck from util-linux 2.20.1
/dev/mapper/vgdata-lvdata04: clean, 11/78400 files, 22234/314368 blocks
resize2fs 1.42 (29-Nov-2011)
Resizing the filesystem on /dev/mapper/vgdata-lvdata04 to 262144 (4k) blocks.
The filesystem on /dev/mapper/vgdata-lvdata04 is now 262144 blocks long.

  Reducing logical volume lvdata04 to 1.00 GiB
  Logical volume lvdata04 successfully resized
# lvs
  LV       VG     Attr   LSize Origin Snap%  Move Log Copy%  Convert
  lvdata01 vgdata -wi-ao 3.00g                                      
  lvdata02 vgdata -wi-ao 1.00g                                      
  lvdata04 vgdata -wi-a- 1.00g 
   Note

Notice the number of 4k blocks shown: 4096*262144/1024^2 gives 1,073,741,824 bytes either 1 GB.

Time to mount the volume again:

# mount /dev/vgdata/lvdata04 /mnt/data04 
# df -h | grep lvdata04
/dev/mapper/vgdata-lvdata04  1021M   79M  891M   9% /mnt/data04

And what is going on at the VG level?

# vgs
  VG     #PV #LV #SN Attr   VSize VFree
  vgdata   4   3   0 wz--n- 7.98g 2.99g

Wow, we have near 3 GB of free space inside, a bit more than one of our PV. It could be great if we can free one of the those and of course LVM gives you the possibility to do that. Before going further, let's check what happened at the PVs level:

# pvs     
  PV         VG     Fmt  Attr PSize PFree   
  /dev/loop0 vgdata lvm2 a-   2.00g       0 
  /dev/loop1 vgdata lvm2 a-   2.00g 1016.00m
  /dev/loop2 vgdata lvm2 a-   2.00g 1020.00m
  /dev/loop3 vgdata lvm2 a-   2.00g    1.00g

Did you noticed? 1 GB of space has been freed on the last PV (/dev/loop3) since lvdata04 has been shrunk not counting the space freed on /dev/loop1 and /dev/loop2 after the removal of lvdata02.


Next steo: can we remove a PV directly (the command to remove a PV from a VG is vgreduce)?

# vgreduce vgdata /dev/loop0 
  Physical volume "/dev/loop0" still in use

Of course not, all of our PVs supports the content of our LVs and we must find a manner to move all of the PE (physical extents) actually hold by the PV /dev/loop0 elsewhere withing the VG. But wait a minute, the victory is there yet: we do have some free space in the /dev/loop0 and we will get more and more free space in it as the displacement process will progress. What is going to happen if, from a concurrent session, we create others LV in vgdata at the same time the content of /dev/loop0 is moved? Simple: it can be filled again with the PEs newly allocated.

So before proceeding to the displacement of what /dev/loop0 contents, we must say to LVM: "please don't allocate anymore PEs on /dev/loop0". This is achieved via the parameter -x of the command pvchange:

# pvchange -x n /dev/loop0 
  Physical volume "/dev/loop0" changed
  1 physical volume changed / 0 physical volumes not changed

The value n given to -x marks the PV as unallocable (i.e. not usable for future PE allocations). Let's check again the PVs with pvs and pvdisplay:

# pvs
  PV         VG     Fmt  Attr PSize PFree   
  /dev/loop0 vgdata lvm2 --   2.00g       0 
  /dev/loop1 vgdata lvm2 a-   2.00g 1016.00m
  /dev/loop2 vgdata lvm2 a-   2.00g 1020.00m
  /dev/loop3 vgdata lvm2 a-   2.00g    1.00g

# pvdisplay /dev/loop0
  --- Physical volume ---
  PV Name               /dev/loop0
  VG Name               vgdata
  PV Size               2.00 GiB / not usable 4.00 MiB
  Allocatable           NO
  PE Size               4.00 MiB
  Total PE              511
  Free PE               0
  Allocated PE          511
  PV UUID               b9i1Hi-llka-egCF-2vU2-f7tp-wBqh-qV4qEk

Great news here, the Attrs field shows a dash instead of 'a' at the leftmost position meaning the PV is effectively not allocatable. However marking a PV not allocatable does not wipe the existing PEs stored on it. In other words, it means that data present on the PV remains absolutely intact. Another positive point lies the remaining capacities of the PVs composing vgdata: the sum of free space available on /dev/loop1, /dev/loop2 and /dev/loop3 is 3060MB (1016MB + 1020MB + 1024MB) so largely sufficient to hold the 2048 MB (2 GB) actually stored on the PV /dev/loop0.

Now we have frozen the allocation of PEs on /dev/loop0 we can make LVM move all of PEs located in this PV on the others PVs composing the VG vgdata. Again, we don't have to worry about the gory details like where LVM will precisely relocate the PEs actually hold by /dev/loop0, our only concerns is to get all of them moved out of /dev/loop0. That job gets done by:

# pvmove /dev/loop0
  /dev/loop0: Moved: 5.9%
  /dev/loop0: Moved: 41.3%
  /dev/loop0: Moved: 50.1%
  /dev/loop0: Moved: 100.0%

We don't have to tell LVM the VG name because it already knows that /dev/loop0 belongs to vgdata and what are the others PVs belonging to that VG usable to host the PEs coming from /dev/loop0. It is absolutely normal for the process to takes some minutes (real life cases can go up to several hours even with SAN disks located on high-end storage hardware which is much more faster than local SATA or even SAS drive).

At the end of the moving process, we can see that the PV /dev/loop0 is totally free:

# pvs
  PV         VG     Fmt  Attr PSize PFree   
  /dev/loop0 vgdata lvm2 a-   2.00g    2.00g
  /dev/loop1 vgdata lvm2 a-   2.00g 1016.00m
  /dev/loop2 vgdata lvm2 a-   2.00g       0 
  /dev/loop3 vgdata lvm2 a-   2.00g       0 

# pvdisplay /dev/loop0
  --- Physical volume ---
  PV Name               /dev/loop0
  VG Name               vgdata
  PV Size               2.00 GiB / not usable 4.00 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              511
  Free PE               511
  Allocated PE          0
  PV UUID               b9i1Hi-llka-egCF-2vU2-f7tp-wBqh-qV4qEk

511 PEs free out of a maximum 511 PEs so all of its containt has been successfully spread on the others PVs (the volume is also still marked as "unallocatable", this is normal). Now it is ready to be detached from the VG vgdata with the help of vgreduce :

# vgreduce vgdata /dev/loop0
  Removed "/dev/loop0" from volume group "vgdata"

What happened to vgdata?

# vgs
  VG     #PV #LV #SN Attr   VSize VFree   
  vgdata   3   3   0 wz--n- 5.99g 1016.00m

Its storage space falls to ~6GB! What would tell pvs?

# pvs
  PV         VG     Fmt  Attr PSize PFree   
  /dev/loop0        lvm2 a-   2.00g    2.00g
  /dev/loop1 vgdata lvm2 a-   2.00g 1016.00m
  /dev/loop2 vgdata lvm2 a-   2.00g       0 
  /dev/loop3 vgdata lvm2 a-   2.00g       0

/dev/loop0 is now a standalone device detached from any VG. However it still contains some LVM metadata that remains to be wiped with the help of the pvremove command:

   Warning

pvremove/pvmove do not destroy the disk content. Please *do* a secure erase of the storage device with shred or any similar tool before disposing of it.

# pvdisplay /dev/loop0
  "/dev/loop0" is a new physical volume of "2.00 GiB"
  --- NEW Physical volume ---
  PV Name               /dev/loop0
  VG Name               
  PV Size               2.00 GiB
  Allocatable           NO
  PE Size               0   
  Total PE              0
  Free PE               0
  Allocated PE          0
  PV UUID               b9i1Hi-llka-egCF-2vU2-f7tp-wBqh-qV4qEk

# pvremove /dev/loop0
  Labels on physical volume "/dev/loop0" successfully wiped
# pvdisplay /dev/loop0
  No physical volume label read from /dev/loop0
  Failed to read physical volume "/dev/loop0"

Great! Things are just simple than that. In their day to day reality, system administrators drive their show in a extremely close similar manner: they do additional tasks like taking backups of data located on the LVs before doing any risky operation or plan applications shutdown periods prior starting a manipulation with a LVM volume to take extra precautions.

Replacing a PV (storage device) by another

The principle a mix of what has been said in the above sections. The principle is basically:

  1. Create a new PV
  2. Associate it to the VG
  3. Move the contents of the PV to be removed on the remaining PVs composing the VG
  4. Remove the PV from the VG and wipe it

The strategy in this paragraph is to reuse /dev/loop0 and make it replace /dev/loop2 (both devices are of the same size, however we also could have used a bigger /dev/loop0 as well).

Here we go! First we need to (re-)create the LVM metadata to make /dev/loop0 usable by LVM:

# pvcreate /dev/loop0
  Physical volume "/dev/loop0" successfully created

Then this brand new PV is added to the VG vgdata thus increasing its size of 2 GB:

# vgextend vgdata  /dev/loop0
  Volume group "vgdata" successfully extended
# vgs
  VG     #PV #LV #SN Attr   VSize VFree
  vgdata   4   3   0 wz--n- 7.98g 2.99g
# pvs
  PV         VG     Fmt  Attr PSize PFree   
  /dev/loop0 vgdata lvm2 a-   2.00g    2.00g
  /dev/loop1 vgdata lvm2 a-   2.00g 1016.00m
  /dev/loop2 vgdata lvm2 a-   2.00g       0 
  /dev/loop3 vgdata lvm2 a-   2.00g       0

Now we have to suspend the allocation of PEs on /dev/loop2 prior to moving its PEs (and freeing some space on it):

# pvchange -x n /dev/loop2
  Physical volume "/dev/loop2" changed
  1 physical volume changed / 0 physical volumes not changed
# pvs
  PV         VG     Fmt  Attr PSize PFree   
  /dev/loop0 vgdata lvm2 a-   2.00g    2.00g
  /dev/loop1 vgdata lvm2 a-   2.00g 1016.00m
  /dev/loop2 vgdata lvm2 --   2.00g       0 
  /dev/loop3 vgdata lvm2 a-   2.00g       0

Then we move all of the the PEs on /dev/loop2 to the rest of the VG:

# pvmove /dev/loop2 
  /dev/loop2: Moved: 49.9%
  /dev/loop2: Moved: 100.0%
# pvs
  PV         VG     Fmt  Attr PSize PFree   
  /dev/loop0 vgdata lvm2 a-   2.00g       0 
  /dev/loop1 vgdata lvm2 a-   2.00g 1016.00m
  /dev/loop2 vgdata lvm2 --   2.00g    2.00g
  /dev/loop3 vgdata lvm2 a-   2.00g       0

Then we remove /dev/loop2 from the VG and we wipe its LVM metadata:

# vgreduce vgdata /dev/loop2
  Removed "/dev/loop2" from volume group "vgdata"
# pvremove /dev/loop2
  Labels on physical volume "/dev/loop2" successfully wiped

Final state of the PVs composing vgdata:

# pvs
  PV         VG     Fmt  Attr PSize PFree   
  /dev/loop0 vgdata lvm2 a-   2.00g       0 
  /dev/loop1 vgdata lvm2 a-   2.00g 1016.00m
  /dev/loop3 vgdata lvm2 a-   2.00g       0

/dev/loop0 took the place of /dev/loop2 :-)

More advanced topics

Backing up the layout

Freezing a VG

LVM snapshots

Linear/Stripped/Mirrored Logical volumes

LVM and Funtoo