ZFSBootMenu setup

ZFSBootMenu setup for Proxmox VE

Last updated
A complete feature-set bootloader for ZFS on root install. It allows booting off multiple datasets, selecting kernels, creating snapshots and clones, rollbacks and much more - as much as a rescue system would.

We will install and take advantage of ZFSBootMenu   after we had gained sufficient knowledge on Proxmox VE and ZFS prior.

Installation

Getting an extra bootloader is straightforward. We place it onto EFI System Partition (ESP), where it belongs (unlike kernels - changing the contents of the partition as infrequent as possible is arguably a great benefit of this approach) and update the EFI variables - our firmware will then default to it the next time we boot. We do not even have to remove the existing bootloader(s), they can stay behind as a backup, but in any case they are also easy to install back later on.

As Proxmox do not casually mount the ESP on a running system, we have to do that first. We identify it by its type:

sgdisk -p /dev/sda
Disk /dev/sda: 268435456 sectors, 128.0 GiB
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): 6EF43598-4B29-42D5-965D-EF292D4EC814
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 268435422
Partitions will be aligned on 2-sector boundaries
Total free space is 0 sectors (0 bytes)

Number  Start (sector)    End (sector)  Size       Code  Name
   1              34            2047   1007.0 KiB  EF02  
   2            2048         2099199   1024.0 MiB  EF00  
   3         2099200       268435422   127.0 GiB   BF01

It is the one with partition type shown as EF00 by sgdisk, typically second partition on a stock PVE install.

Tip

Alternatively, you can look for the sole FAT32 partition with lsblk -f which will also show whether it has been already mounted, but it is NOT the case on a regular setup. Additionally, you can check with findmnt /boot/efi.

Let’s mount it:

mount /dev/sda2 /boot/efi

Create a separate directory for our new bootloader and downloading it:

mkdir /boot/efi/EFI/zbm
wget -O /boot/efi/EFI/zbm/zbm.efi https://get.zfsbootmenu.org/efi

The only thing left is to tell UEFI where to find it, which in our case is disk /dev/sda and partition 2:

efibootmgr -c -d /dev/sda -p 2 -l "EFI\zbm\zbm.efi" -L "Proxmox VE ZBM"
BootCurrent: 0004
Timeout: 0 seconds
BootOrder: 0001,0004,0002,0000,0003
Boot0000* UiApp
Boot0002* UEFI Misc Device
Boot0003* EFI Internal Shell
Boot0004* Linux Boot Manager
Boot0001* Proxmox VE ZBM

We named our boot entry Proxmox VE ZBM and it became default, i.e. first to be attempted to boot off at the next opportunity. We can now reboot and will be presented with the new bootloader:

ZFSBootMenu Booting
ZFSBootMenu Booting

If we do not press anything, it will just boot off our root filesystem stored in rpool/ROOT/pve-1 dataset. That easy.

Booting directly off ZFS

Before we start exploring our bootloader and its convenient features, let us first appreciate how it knew how to boot us into the current system, simply after installation. We had NOT have to update any boot entries as would have been the case with other bootloaders.

Boot environments

We simply let EFI know where to find the bootloader itself and it then found our root filesystem, just like that. It did it be sweeping the available pools and looking for datasets with / mountpoints and then looking for kernels in /boot directory - which we have only one instance of. There is more elaborate rules at play in regards to the so-called boot environments - which you are free to explore further   - but we happened to have satisfied them.

Kernel command line

The bootloader also appended some kernel command line parameters   - as we can check for the current boot:

cat /proc/cmdline
root=zfs:rpool/ROOT/pve-1 quiet loglevel=4 spl.spl_hostid=0x7a12fa0a

Where did these come from? Well, the rpool/ROOT/pve-1 was intelligently found by our bootloader. The hostid parameter is added for the kernel - something we briefly touched on before in the post on rescue boot with ZFS context. This is part of Solaris Porting Layer (SPL) that helps kernel to get to know the /etc/hostid   value despite it would not be accessible within the initramfs   - something we will keep out of scope here.

The rest are defaults which we can change to our own liking. You might have already sensed that it will be equally elegant as the overall approach i.e. no rebuilds of initramfs needed, as this is the objective of the entire escapade with ZFS booting - and indeed it is, via a ZFS dataset property org.zfsbootmenu:commandline - obviously specific to our bootloader.

We can make our boot verbose by simply omitting quiet from the command line:

zfs set org.zfsbootmenu:commandline="loglevel=4" rpool/ROOT/pve-1

The effect could be observed on the next boot off this dataset.

Important

Do note that we did NOT include root= parameter. If we did, it would have been ignored as this is determined and injected by the bootloader itself.

Forgotten default

Proxmox VE comes with very unfortunate default for the ROOT dataset - and thus all its children. It does not cause any issues insofar we do not start adding up multiple children datasets with alternative root filesystems, but it is unclear what the reason for this was as even the default install invites us to create more of them - the stock one is pve-1 after all.

More precisely, if we went on and added more datasets with mountpoint=/ - something we actually WANT so that our bootloader can recongise them as menu options, we would discover the hard way that there is another tricky option that should NOT really be set on any root dataset, namely canmount=on which is a perfectly reasonable default for any OTHER dataset.

The property canmount   determines whether dataset can be mounted or whether it will be auto-mounted during the event of a pool import. The current on value would cause all the datasets that are children of rpool/ROOT be automounted when calling zpool import -a - and this is exactly what Proxmox set us up with due to its zfs-import-scan.service, i.e. such import happens every time on startup.

It is nice to have pools auto-imported and mounted, but this is a horrible idea when there is multiple pools set up with the same mountpount, such as with a root pool. We will set it to noauto so that this does not happen to us when we later have multiple root filesystems. This will apply to all future children datasets, but we also explicitly set it to the existing one. Unfortunately, there appears to be a ZFS bug where it is impossible to issue zfs inherit on a dataset that is currently mounted.

zfs set canmount=noauto rpool/ROOT
zfs set -u canmount=noauto rpool/ROOT/pve-1

Note

Setting root datasets to not be automatically mounted does not really cause any issues as the pool is already imported and root filesystem mounted based on the kernel command line.

Boot menu and more

Now finally, let’s reboot and press ESC before the 10 seconds timeout passes on our bootloader screen. The boot menu cannot be any more self-explanatory, we should be able to orient ourselves easily after all what we have learnt before:

ZFSBootMenu Boot Environments
ZFSBootMenu Boot Environments

We can see the only dataset available pve-1, we see the kernel 6.8.12-6-pve is about to be used as well as complete command line. What is particularly neat however are all the other options (and shortcuts) here. Feel free to cycle between different screens also by left and right arrow keys.

For instance, on the Kernels screen we would see (and be able to choose) an older kernel:

ZFSBootMenu Kernels
ZFSBootMenu Kernels

We can even make it default with C^D (or CTRL+D key combination) as the footer provides a hint for - this is what Proxmox call “pinning a kernel” and wrapped into their own extra tooling - which we do not need.

We can even see the Pool Status and explore the logs with C^L or get into Recovery Shell with C^R all without any need for an installer, let alone bespoke one that would support ZFS to begin with. We can even hop into a chroot environment with C^J with ease. This bootloader simply doubles as a rescue shell.

Snapshot and clone

But we are not here for that now, we will navigate to the Snapshots screen and create a new one with C^N, we will name it snapshot1. Wait a brief moment. And we have one:

ZFSBootMenu Snapshots
ZFSBootMenu Snapshots

If we were to just press ENTER on it, it would “duplicate” it into a fully fledged standalone dataset (that would be an actual copy), but we are smarter than that, we only want a clone, so we press C^C and name it pve-2. This is a quick operation and we get what we expected:

ZFSBootMenu Boot Environments
ZFSBootMenu Boot Environments

We can now make the pve-2 dataset our default boot option with a simple press of C^D on the entry when selected - this sets a property bootfs on the pool (NOT the dataset) we had not talked about before, but it is so conveniently transparent to us, we can abstract from it all.

Clone boot

If we boot into pve-2 now, nothing will appear any different, except our root filesystem is running of a cloned dataset:

findmnt /
TARGET SOURCE           FSTYPE OPTIONS
/      rpool/ROOT/pve-2 zfs    rw,relatime,xattr,posixacl,casesensitive

And both datasets are available:

zfs list
NAME               USED  AVAIL  REFER  MOUNTPOINT
rpool             33.8G  88.3G    96K  /rpool
rpool/ROOT        33.8G  88.3G    96K  none
rpool/ROOT/pve-1  17.8G   104G  1.81G  /
rpool/ROOT/pve-2    16G   104G  1.81G  /
rpool/data          96K  88.3G    96K  /rpool/data
rpool/var-lib-vz    96K  88.3G    96K  /var/lib/vz

We can also check our new default set through the bootloader:

zpool get bootfs
NAME   PROPERTY  VALUE             SOURCE
rpool  bootfs    rpool/ROOT/pve-2  local

Yes, this means there is also an easy way to change the default boot dataset for the next reboot from a running system:

zpool set bootfs=rpool/ROOT/pve-1 rpool

And if you wonder about the default kernel, that is set in: org.zfsbootmenu:kernel property.

Clone promotion

Now suppose we have not only tested what we needed in our clone, but we are so happy with the result, we want to keep it instead of the original dataset based off which its snaphost has been created. That sounds like a problem as a clone depends on a snapshot and that in turn depends on its dataset. This is exactly what promotion is for. We can simply:

zfs promote rpool/ROOT/pve-2

Nothing will appear to have happened, but if we check pve-1:

zfs get origin rpool/ROOT/pve-1
NAME              PROPERTY  VALUE                       SOURCE
rpool/ROOT/pve-1  origin    rpool/ROOT/pve-2@snapshot1  -

Its origin now appears to be a snapshot of pve-2 instead - the very snapshot that was previously made off pve-1.

And indeed it is the pve-2 now that has a snapshot instead:

zfs list -t snapshot rpool/ROOT/pve-2
NAME                         USED  AVAIL  REFER  MOUNTPOINT
rpool/ROOT/pve-2@snapshot1  5.80M      -  1.81G  -

We can now even destroy pve-1 and the snapshot as well:

Warning

Exercise EXTREME CAUTION when issuing zfs destroy commands - there is NO confirmation prompt and it is easy to execute them without due care, in particular in terms omitting a snapshot part of the name following @ and thus removing entire dataset when passing on -r and -f switch which we will NOT use here for that reason.

It might also be a good idea to prepend these command by a space character, which on a common regular Bash shell setup would prevent them from getting recorded in history and thus accidentally re-executed. This would be also one of the reasons to avoid running everything under the root user all of the time.

zfs destroy rpool/ROOT/pve-1
zfs destroy rpool/ROOT/pve-2@snapshot1

And if you wonder - yes, there was an option to clone and right away promote the clone in the boot menu itself - the C^X shortkey.

Done

We got quite a complete feature set when it comes to ZFS on root install. We can actually create snapshots before risky operations, rollback to them, but on a more sophisticated level have several clones of our root dataset any of which we can decide to boot off on a whim.

None of this requires some intricate bespoke boot tools that would be copying around files from /boot to the EFI System Partition and keep it “synchronised” or that need to have the menu options rebuilt every time there is a new kernel coming up.

Most importantly, we can do all the sophisticated operations NOT on a running system, but from a separate environment while the host system is not running, thus achieving the best possible backup quality in which we do not risk any corruption. And the host system? Does not know a thing. And does not need to.

Enjoy your proper ZFS-friendly bootloader, one that actually understands your storage stack better than stock Debian install ever would and provides better options than what ships with stock Proxmox VE.

Your feedback on the content is welcome in the GitHub Discussions.