Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snapshotting top-level "bpool" filesystem causes grub to fail #13873

Closed
bghira opened this issue Sep 10, 2022 · 36 comments
Closed

snapshotting top-level "bpool" filesystem causes grub to fail #13873

bghira opened this issue Sep 10, 2022 · 36 comments
Labels
Component: GRUB GRUB integration Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@bghira
Copy link

bghira commented Sep 10, 2022

System information

Type Version/Name
Distribution Name Gentoo
Distribution Version amd64
Kernel Version 5.19.6-gentoo
Architecture amd64
OpenZFS Version zfs-kmod-2.1.99-1358_g60d995727 / zfs-2.1.99-1359_gede037cda

Describe the problem you're observing

I've noticed last week that my system stopped being able to find my zpool which has disabled features so that Grub can detect it.

I ran grub-probe /path/to/bpool and it shows the error:

# grub-probe /roots/gentoo/boot 
grub-probe: error: compression algorithm inherit not supported 
.

Tried setting compression algorithm explicitly on each filesystem in the bpool, no change.

Describe how to reproduce the problem

I recreated the pool:

zpool create -o ashift=13 -o autotrim=on -d -o feature@async_destroy=enabled -o feature@bookmarks=enabled -o feature@embedded_data=enabled -o feature@empty_bpobj=enabled -o feature@enabled_txg=enabled -o feature@extensible_dataset=enabled -o feature@filesystem_limits=enabled -o feature@hole_birth=enabled -o feature@large_blocks=enabled -o feature@lz4_compress=enabled -o feature@spacemap_histogram=enabled -O acltype=posixacl -O canmount=off -O compression=lz4 -O devices=off -O normalization=formD -O relatime=on -O xattr=sa -O mountpoint=/boot bpool mirror /dev/nvme0n1p3 /dev/nvme1n1p3

And then, the command works:

# grub-probe /roots/gentoo/boot 
zfs 

I then snapshot the child filesystem:

# zfs snap bpool/BOOT/gentoo@boot2 
# grub-probe /roots/gentoo/boot 
zfs 
# zfs snap bpool/BOOT@boot2 
# grub-probe /roots/gentoo/boot 
zfs 

I can snapshot the lowest level child filesystem, and I then snapshot the pool itself:

# zfs snap bpool@boot2 
# grub-probe /roots/gentoo/boot 
grub-probe: error: compression algorithm inherit not supported 
.

That's when things go south. Up until this point, I can reboot readily and Grub works just fine. This was working for a very long time, and I haven't upgraded Grub at all.

@bghira bghira added the Type: Defect Incorrect behavior (e.g. crash, hang) label Sep 10, 2022
@ryao
Copy link
Contributor

ryao commented Sep 12, 2022

This looks like a bug in grub. Let us keep this open to track the issue (and encourage patches), but someone hound file a bug with the GRUB project.

@ryao ryao added the Component: GRUB GRUB integration label Sep 13, 2022
@mauricev
Copy link

https://savannah.gnu.org/bugs/index.php?64297

@R8s6
Copy link

R8s6 commented Nov 17, 2023

I still encounter this error running Arch with Grub version 2:2.12rc1-5.

I had to destroy the pool and recreate one, then disable snapshotting on the "boot" pool as a temporary workaround.

Alternatively, one can take snapshots of the datasets but not the pool.

Edit: i.e. If you're using sanoid, instead of using recursive = yes, you can use recursive = zfs

@mifritscher
Copy link

mifritscher commented Jan 12, 2024

I'm using grub grub-probe (GRUB) 2.12~rc1-12 (But I looked at grub master, without notable difference)

I got this problem after updating from zfs 2.1.4 to 2.2.2 (without zfs upgrade).

a grub-install -v -v says:

grub-core/kern/fs.c:56:fs: Detecting zfs...
grub-core/osdep/hostdisk.c:379:hostdisk: opening the device `/dev/nvme0n1p12' in open_device()
grub-core/fs/zfs/zfs.c:1199:zfs: label ok 0
grub-core/osdep/hostdisk.c:358:hostdisk: reusing open device `/dev/nvme0n1p12'
grub-core/fs/zfs/zfs.c:1014:zfs: check 2 passed
grub-core/fs/zfs/zfs.c:1025:zfs: check 3 passed
grub-core/fs/zfs/zfs.c:1032:zfs: check 4 passed
grub-core/fs/zfs/zfs.c:1042:zfs: check 6 passed
grub-core/fs/zfs/zfs.c:1050:zfs: check 7 passed
grub-core/fs/zfs/zfs.c:1061:zfs: check 8 passed
grub-core/fs/zfs/zfs.c:1071:zfs: check 9 passed
grub-core/fs/zfs/zfs.c:1093:zfs: check 11 passed
grub-core/fs/zfs/zfs.c:1119:zfs: check 10 passed
grub-core/fs/zfs/zfs.c:1135:zfs: str=com.delphix:embedded_data
grub-core/fs/zfs/zfs.c:1135:zfs: str=com.delphix:hole_birth
grub-core/fs/zfs/zfs.c:1144:zfs: check 12 passed (feature flags)
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 4096/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = -1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, 1a0b050
grub-core/osdep/hostdisk.c:358:hostdisk: reusing open device `/dev/nvme0n1p12'
grub-core/fs/zfs/zfs.c:2694:zfs: endian = -1, blkid=0
grub-core/fs/zfs/zfs.c:2031:zfs: endian = -1
grub-core/fs/zfs/zfs.c:2062:zfs: endian = -1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 131072/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = -1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, c0b8f0
grub-core/osdep/hostdisk.c:358:hostdisk: reusing open device `/dev/nvme0n1p12'
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2057:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, 16111f0
grub-core/osdep/hostdisk.c:358:hostdisk: reusing open device `/dev/nvme0n1p12'
grub-core/fs/zfs/zfs.c:2699:zfs: alive
grub-core/fs/zfs/zfs.c:2505:zfs: looking for 'features_for_read'
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2057:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, d00178
grub-core/osdep/hostdisk.c:358:hostdisk: reusing open device `/dev/nvme0n1p12'
grub-core/fs/zfs/zfs.c:2515:zfs: zap read
grub-core/fs/zfs/zfs.c:2528:zfs: fat zap
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2057:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, b09900
grub-core/osdep/hostdisk.c:358:hostdisk: reusing open device `/dev/nvme0n1p12'
grub-core/fs/zfs/zfs.c:2288:zfs: fzap: length 18
grub-core/fs/zfs/zfs.c:2532:zfs: returned 0
grub-core/fs/zfs/zfs.c:2694:zfs: endian = -1, blkid=1
grub-core/fs/zfs/zfs.c:2031:zfs: endian = -1
grub-core/fs/zfs/zfs.c:2062:zfs: endian = -1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 131072/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = -1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, c0b8f0
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2057:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, 160f8c0
grub-core/osdep/hostdisk.c:358:hostdisk: reusing open device `/dev/nvme0n1p12'
grub-core/fs/zfs/zfs.c:2699:zfs: alive
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2057:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 512/512
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, 600cb0
grub-core/osdep/hostdisk.c:358:hostdisk: reusing open device `/dev/nvme0n1p12'
grub-core/fs/zfs/zfs.c:2118:zfs: zap: name = com.delphix:extensible_dataset, value = 4, cd = 0
grub-core/fs/zfs/zfs.c:2118:zfs: zap: name = com.delphix:embedded_data, value = 1, cd = 0
grub-core/fs/zfs/zfs.c:2118:zfs: zap: name = com.delphix:hole_birth, value = 1, cd = 0
grub-core/fs/zfs/zfs.c:2118:zfs: zap: name = org.open-zfs:large_blocks, value = 0, cd = 0
grub-core/fs/zfs/zfs.c:2118:zfs: zap: name = org.illumos:lz4_compress, value = 1, cd = 0
grub-core/fs/zfs/zfs.c:2118:zfs: zap: name = , value = 0, cd = 0
grub-core/fs/zfs/zfs.c:2118:zfs: zap: name = , value = 0, cd = 0
grub-core/fs/zfs/zfs.c:3293:zfs: alive
grub-core/fs/zfs/zfs.c:3105:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2694:zfs: endian = 1, blkid=0
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2062:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 131072/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, c0b8f0
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2057:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, 16111f0
grub-core/fs/zfs/zfs.c:2699:zfs: alive
grub-core/fs/zfs/zfs.c:3112:zfs: alive
grub-core/fs/zfs/zfs.c:2505:zfs: looking for 'root_dataset'
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2057:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, d00178
grub-core/fs/zfs/zfs.c:2515:zfs: zap read
grub-core/fs/zfs/zfs.c:2528:zfs: fat zap
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2057:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, b09900
grub-core/fs/zfs/zfs.c:2288:zfs: fzap: length 13
grub-core/fs/zfs/zfs.c:2532:zfs: returned 0
grub-core/fs/zfs/zfs.c:3118:zfs: alive
grub-core/fs/zfs/zfs.c:2694:zfs: endian = 1, blkid=1
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2062:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 131072/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, c0b8f0
grub-core/fs/zfs/zfs.c:2031:zfs: endian = 1
grub-core/fs/zfs/zfs.c:2057:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, 160f8c0
grub-core/fs/zfs/zfs.c:2699:zfs: alive
grub-core/fs/zfs/zfs.c:3124:zfs: alive
grub-core/fs/zfs/zfs.c:3302:zfs: alive
grub-core/fs/zfs/zfs.c:3306:zfs: endian = 0
grub-core/fs/zfs/zfs.c:3315:zfs: endian = 1
grub-core/fs/zfs/zfs.c:3170:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 4096/4096
grub-core/fs/zfs/zfs.c:1907:zfs: endian = 1
grub-core/fs/zfs/zfs.c:597:zfs: dva=8, 300018
grub-core/osdep/hostdisk.c:358:hostdisk: reusing open device `/dev/nvme0n1p12'
grub-core/fs/zfs/zfs.c:3395:zfs: endian = 1
grub-core/fs/zfs/zfs.c:3170:zfs: endian = 1
grub-core/fs/zfs/zfs.c:1885:zfs: zio_read: E 0: size 0/512
grub-core/kern/fs.c:79:fs: Fehler: compression algorithm inherit not supported
.
grub-core/kern/fs.c:80:fs: zfs detection failed.
grub-install: Fehler: compression algorithm inherit not supported

Following lines seem to bail out:

  if (comp != ZIO_COMPRESS_OFF && decomp_table[comp].decomp_func == NULL)
    return grub_error (GRUB_ERR_NOT_IMPLEMENTED_YET,
		       "compression algorithm %s not supported\n", decomp_table[comp].name)

The thing is: "inherit" is not a compression type per se, but says "use whatever its parent (or parent-parent or ...) is using).

So, it has no decompress method:

static decomp_entry_t decomp_table[ZIO_COMPRESS_FUNCTIONS] = {
{"inherit", NULL}, /* ZIO_COMPRESS_INHERIT */

I think the missing piece that grub does not look up the parents if inherit.

@mifritscher
Copy link

I can confirm that using grub 2.12 does indeed help.
So one of the top 4 commits of https://git.savannah.gnu.org/cgit/grub.git/log/grub-core/fs/zfs/zfs.c solves the problem. If I need to guess I would say "ZFS: Don't iterate over null objsets".

@timkgh
Copy link

timkgh commented Jan 16, 2024

I can confirm that using grub 2.12 does indeed help. So one of the top 4 commits of https://git.savannah.gnu.org/cgit/grub.git/log/grub-core/fs/zfs/zfs.c solves the problem. If I need to guess I would say "ZFS: Don't iterate over null objsets".

How does one upgrade grub in Ubuntu 22.04 LTS in particular when I can't boot at all? I don't understand how it broke all of a sudden, I've been taking snapshots for years (with sanoid).

@mifritscher
Copy link

How does one upgrade grub in Ubuntu 22.04 LTS in particular when I can't boot at all? I don't understand how it broke all of a sudden, I've been taking snapshots for years (with sanoid).

You can e.g. start a live version, install zfs drivers and import bpool. Then you can make an USB stick with grub and copy your kernel + initrd + the grub config needed to start this on it. Then you can boot your installation with it (don't forget to export bpool before ;)

Another way is to import both bpool and rpool, mount it together, bindmount dev, run, sys and proc and chroot it. I used this way on debian bookworm.

Either way, you can either try to install grub 2.12 packages from newer distro versions, or build grub manually (it isn't too complicated, just ensure that you have the zfs and devicemapper libs installed,the configure script will tell you if you have...) and, if you use uefi boot, use the --with-platform=efi . Then, a grub-install does the job and you have (hopefully) a bootable system again.

@timkgh
Copy link

timkgh commented Jan 16, 2024

I managed to boot it for now using portable ZfsBootMenu running from a flash drive. But still unclear what I should do to fix Ubuntu 22.04. I don't have a bpool, this is an old setup from the 18.04 days, there's a single rpool that contains /boot. I'm thinking now that it was a bad idea and I should either move it to its own bpool or just give up and make an ext4 partition for /boot and avoid future grub issues.

@n0099
Copy link

n0099 commented Jan 18, 2024

https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/2041739/comments/9

zpool create \
    -o feature@extensible_dataset=disabled \
    -o feature@bookmarks=disabled \
    -o feature@filesystem_limits=disabled \
    -o feature@large_blocks=disabled \
    -o feature@large_dnode=disabled \
    -o feature@sha512=disabled \
    -o feature@skein=disabled \
    -o feature@edonr=disabled \
    -o feature@userobj_accounting=disabled \
    -o feature@encryption=disabled \
    -o feature@project_quota=disabled \
    -o feature@obsolete_counts=disabled \
    -o feature@bookmark_v2=disabled \
    -o feature@redaction_bookmarks=disabled \
    -o feature@redacted_datasets=disabled \
    -o feature@bookmark_written=disabled \
    -o feature@livelist=disabled \
    -o feature@zstd_compress=disabled \
    -o feature@zilsaxattr=disabled \
    -o feature@head_errlog=disabled \
    -o feature@blake3=disabled \
    -o feature@vdev_zaps_v2=disabled \
[...]

Enabling any of the features in the command above will cause grub not to recognize /boot as zfs again when a snapshot is created on bpool.

@timkgh
Copy link

timkgh commented Jan 18, 2024

FWIW, I fixed my issues by moving to ZFSBootMenu and couldn't be happier. Excellent piece of software to pair with ZFS!

@dannyp777
Copy link

dannyp777 commented Jan 23, 2024

From what I can tell, this bug may have been around for up to 7 yrs.
I encountered it when upgrading to Ubuntu Mantic Nov/2023 here: https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/2041739 using grub2 package version: 2.12~rc1-10ubuntu4

I resolved the problem on my Ubuntu Mantic system by deleting the bpool and recreating with following zfs options:

zpool create -d \
-o compatibility=grub2,ubuntu-22.04 \
-O devices=off \
bpool \
/dev/sda3

Mathias Aerts identified that the culperate zfs flag is feature@extensible_dataset and has had success using a bpool created with all extra features disabled.
I am not sure how it came to be that an incomaptible flag was enabled on the bpool in the first place, but it seemed to be related to the snapshot process. He had been using sanoid, I had been using zfs-auto-snapshot. Now I am using zsys without any problems so far.

Maybe update-grub/grub-probe should check the zfs version/flags/features before trying to do anything?

@bghira
Copy link
Author

bghira commented Jan 25, 2024

this is correct ^

@marmeladapk
Copy link

The solution with disabling all features (except those not supported by the kernel) listed in @n0099 post works for me so far.

I advise against just installing grub 2.12 to your boot environment unless you also update grub packages in your system to 2.12. Old grub tools (<2.12) in your system won't be able to detect fs_uuid of bpool and won't properly generate menu entries. This will bork your boot menu once again the next time there's a new kernel version when entries are regenerated.

@bghira
Copy link
Author

bghira commented Jan 29, 2024

creating a new bpool isnt a solution, it is a workaround, and a very poor one at that.

seeing how this isnt actually a grub issue, it should likely be receiving more attention @pcd1193182

@mifritscher
Copy link

@bghira : Hmm? It is actually a grub issue. Between grub 2.12 rc1 and grub 2.12 release there were 4 bug fixes ( https://git.savannah.gnu.org/cgit/grub.git/log/?h=grub-2.12&qt=grep&q=zfs , I mean the 4 from 2023-09-18). It is left for someone else as exercise to bisect which one is the bug fix for the problem here.

Probably grub traps over one of these bugs when the pool gets used (completely legal) in certain ways.

Yes, installing 2.12 in the boot environment is a one time hack. But e.g. debian sid has 2.12 , which can be installed in bookworm as well without much hassle. I did exactly this in my case ;-)

@SimonBard
Copy link

I resolved the problem on my Ubuntu Mantic system by deleting the bpool and recreating with following zfs options:

zpool create -d \
-o compatibility=grub2,ubuntu-22.04 \
-O devices=off \
bpool \
/dev/sda3

Ok so you destroyed the bpool and created it with the parameters.
How do I have to go on now?
I have to create a bootloader, how can I achieve this?

@R8s6
Copy link

R8s6 commented Mar 4, 2024

I resolved the problem on my Ubuntu Mantic system by deleting the bpool and recreating with following zfs options:

zpool create -d \
-o compatibility=grub2,ubuntu-22.04 \
-O devices=off \
bpool \
/dev/sda3

Ok so you destroyed the bpool and created it with the parameters. How do I have to go on now? I have to create a bootloader, how can I achieve this?

which OS are you on (i.e. arch, ubuntu, fedora, etc)?

The general idea is to create the zfs bpool, chroot into the OS (from an external USB drive), install a bootloader (usually grub, but could be something else), re-generate initrmfs, probably need to re-install kernels (and microcode) as well.

I know how to do it on Arch, so let me know if you're on Arch or Arch based OS, i can write you the exact steps.

cheers.

@SimonBard
Copy link

which OS are you on (i.e. arch, ubuntu, fedora, etc)?

The general idea is to create the zfs bpool, chroot into the OS (from an external USB drive), install a bootloader (usually grub, but could be something else), re-generate initrmfs, probably need to re-install kernels (and microcode) as well.

I know how to do it on Arch, so let me know if you're on Arch or Arch based OS, i can write you the exact steps.

cheers.

Many thanks!
I am on ubuntu.

I tried steps as described here, but already fail at the first command of Step 5 with

grub-probe /boot
grub-probe: Achtung: Platte existiert nicht, ersatzweise wird Partition des Geräts /dev/sdb4 verwendet
grub-probe: Achtung: Platte existiert nicht, ersatzweise wird Partition des Geräts /dev/sdb4 verwendet
grub-probe: Achtung: Platte existiert nicht, ersatzweise wird Partition des Geräts /dev/sdb4 verwendet
grub-probe: Fehler: Laufwerk >hostdisk//dev/sdb4< wurde nicht gefunden

Translation:

grub-probe: Attention: disk does not exist, partition of the device /dev/sdb4 is used instead
grub-probe: error: disk >hostdisk//dev/sdb4< was not found

@R8s6
Copy link

R8s6 commented Mar 5, 2024

That guide was ambiguous when a command is run as root (#) or a regular user ($), in this case, grub-probe should be run as root or with sudo privilege, so could you please try this:

If you're under a non-root user, please try:
$ sudo grub-probe /boot

Or login as root:

$ su -
# grub-probe /boot

Source: https://superuser.com/questions/1195918/grub-probe-warning-disk-does-not-exist-so-falling-back-to-partition-device-d

@SimonBard
Copy link

Many thanks!

I have followed this guide and it worked. I do not want to break it right now.

@nickcmaynard
Copy link

nickcmaynard commented Mar 5, 2024

After installing noble's 2.12 packages into a mantic install, grub-install and update-grub, and snapshotting bpool, my boot environment is now broken. I would respectfully suggest that a 2.12 install may not be the fix that we are hoping it may be. I shall recreate with the options @dannyp777 suggests, and disable extensible_dataset in addition.

@dannyp777
Copy link

I resolved the problem on my Ubuntu Mantic system by deleting the bpool and recreating with following zfs options:

zpool create -d \
-o compatibility=grub2,ubuntu-22.04 \
-O devices=off \
bpool \
/dev/sda3

Ok so you destroyed the bpool and created it with the parameters.
How do I have to go on now?
I have to create a bootloader, how can I achieve this?

I took a copy of /boot before I deleted the bpool then just copied it back once I had recrecreated the bpool. Sorry, I hadn't meant my original reply to be a comprehensive how-to. There are more details of things people tried over at the launchpad bug report.

@timkgh
Copy link

timkgh commented Mar 6, 2024

Using ZFSBootMenu is another, much easier option.

@ptomulik
Copy link

HI, I was just struggling with this issue on Debian bookworm (grub-efi (2.06-13+deb12u1)).

I've recreated my boot pool with -O compatiblity=grub2, reinstalled grub, but this didn't help (no boot after first snapshot made by zfs-auto-snapshot).

Then I've recreated the pool again and installed manually grub-efi from backports:

apt install grub-efi/bookworm-backports

which installed 2.12-1~bpo12+1 version of the package (and updated dependencies appropriatelly).

My OS survived two restarts, one before snapshot and another after one snapshot. Still testing and observing...

@amotin
Copy link
Member

amotin commented Mar 22, 2024

@ptomulik See #15909. grub2 config appeared to be incompatible with earlier grub versions due to a bug fixed in 2.12. That PR introduces separate grub-2.06 config specifically for this problem. We haven't decided what to do with grub2 to not break or annoy existing users.

@SimonBard
Copy link

HI, I was just struggling with this issue on Debian bookworm (grub-efi (2.06-13+deb12u1)).

I've recreated my boot pool with -O compatiblity=grub2, reinstalled grub, but this didn't help (no boot after first snapshot made by zfs-auto-snapshot).

Then I've recreated the pool again and installed manually grub-efi from backports:

apt install grub-efi/bookworm-backports

which installed 2.12-1~bpo12+1 version of the package (and updated dependencies appropriatelly).

My OS survived two restarts, one before snapshot and another after one snapshot. Still testing and observing...

Interesting, I am getting this message when trying to install "grub-efi/bookworm-backports:

$ sudo apt install grub-efi/bookworm-backports
Paketlisten werden gelesen… Fertig
Abhängigkeitsbaum wird aufgebaut… Fertig
Statusinformationen werden eingelesen… Fertig
Paket grub-efi ist nicht verfügbar, wird aber von einem anderen Paket
referenziert. Das kann heißen, dass das Paket fehlt, dass es abgelöst
wurde oder nur aus einer anderen Quelle verfügbar ist.
Doch die folgenden Pakete ersetzen es:
  grub-common grub-common:i386 grub-efi-ia32-bin grub-efi-ia32

E: Veröffentlichung »bookworm-backports« für »grub-efi« konnte nicht gefunden werden.

Deepl translates this to:

$ sudo apt install grub-efi/bookworm-backports
Package lists are read... Done
Dependency tree is built... Done
Status information is read... Done
Package grub-efi is not available, but is referenced by another package
referenced by another package. This may mean that the package is missing, that it has been replaced
or is only available from another source.
However, the following packages replace it:
 grub-common grub-common:i386 grub-efi-ia32-bin grub-efi-ia32

E: Publication "bookworm-backports" for "grub-efi" could not be found.

Translated with DeepL.com (free version)

@ptomulik
Copy link

ptomulik commented Apr 7, 2024

@SimonBard Do you have appropriate apt sources in your apt config, as explained here?

@SimonBard
Copy link

@SimonBard Do you have appropriate apt sources in your apt config, as explained here?

Sorry, my bad. I did not associate bookworm with debian.

I am using ubuntu 22.04

Should I install grub from live-dvd?
I am using zfs-boot-menu from usb stick atm:
https://docs.zfsbootmenu.org/en/v2.3.x/general/portable.html

@ptomulik
Copy link

ptomulik commented Apr 7, 2024

@SimonBard I haven't tried it by myself, but it looks like using backports on Ubuntu is pretty simillar to using backports on Debian.

https://help.ubuntu.com/community/UbuntuBackports

Sad news is that probably grub-efi is missing from jammy-backports

https://packages.ubuntu.com/search?suite=jammy-backports&keywords=grub-efi

@mabra
Copy link

mabra commented May 3, 2024

I still encounter this error running Arch with Grub version 2:2.12rc1-5.

I had to destroy the pool and recreate one, then disable snapshotting on the "boot" pool as a temporary workaround.

Alternatively, one can take snapshots of the datasets but not the pool.

i.e. If you're using sanoid, instead of using recursive = yes, you can use recursive = zfs

Could you probably explain?

Alternatively, one can take snapshots of the datasets but not the pool.

There is no pool without the toplevel filesystem ,so I do not understand i!
Thanks.

@R8s6
Copy link

R8s6 commented May 3, 2024

please ignore the sanoid part, i just did another experiment, and it was not accurate, so i just crossed it out from the original comment.

About taking the snapshots of datasets vs pool, i meant this:

Let's say you have zboot as the pool, with zboot/boot and zboot/boot/default being its datasets,
with zboot/boot/default mounted as /boot, something like this:

zboot                250M   614M    24K  none
zboot/boot           238M   614M    24K  none
zboot/boot/default   238M   614M  70.2M  /boot

A a temp workaround, please try only taking snapshots of zboot/boot and/or zboot/boot/default, but not the top-level zboot, otherwise grub would fail.

@zhouska
Copy link

zhouska commented Jun 5, 2024

I resolved the problem on my Ubuntu Mantic system by deleting the bpool and recreating with following zfs options:

zpool create -d \
-o compatibility=grub2,ubuntu-22.04 \
-O devices=off \
bpool \
/dev/sda3

Ok so you destroyed the bpool and created it with the parameters.
How do I have to go on now?
I have to create a bootloader, how can I achieve this?

I took a copy of /boot before I deleted the bpool then just copied it back once I had recrecreated the bpool. Sorry, I hadn't meant my original reply to be a comprehensive how-to. There are more details of things people tried over at the launchpad bug report.

Your answer is incomplete and will only create more issues.

For some reason, grub2 doesn't seem to like the idea of a root dataset. The grub shell won't be able to list any files there.

In order to mimic whatever Ubuntu installer did when the system was installed, you need to do something like this:

zfs set mountpoint=none bpool
zfs set canmount=off bpool

and then create filesystem dataset to act as a container [1]:

zfs create -o canmount=off -o mountpoint=none bpool/BOOT

and create filesystem dataset [1]:

UUID=$(dd if=/dev/urandom bs=1 count=100 2>/dev/null | tr -dc 'a-z0-9' | cut -c-6)
zfs create -o mountpoint=/boot bpool/BOOT/ubuntu_$UUID

only then you can mount it at an alternate path and restore all the files and directories from step you mentioned above.

Once done, you can verify it from within chroot environment with:

grub-probe /boot

It should list zfs as the filesystem.

As a last thing, you need to tweak the new bpool mountpoint by changing existing or re-creating the zfs-list.cache file (you already have one pointing to old mount point) [2]:

mkdir /etc/zfs/zfs-list.cache
touch /etc/zfs/zfs-list.cache/bpool

# enable the tracking ZEDLET
systemctl enable zfs-zed.service
systemctl restart zfs-zed.service

# trigger cache refresh
zfs set relatime=off main/secure
zfs inherit relatime main/secure

# re-run systemd generators and reboot
systemctl daemon-reload

References:
1
2

@Low-power
Copy link
Contributor

I can confirm that using grub 2.12 does indeed help.
So one of the top 4 commits of https://git.savannah.gnu.org/cgit/grub.git/log/grub-core/fs/zfs/zfs.c solves the problem. If I need to guess I would say "ZFS: Don't iterate over null objsets".

I think it was ZFS: Check bonustype in addition to dnode type.
After applying this change, my old GRUB version (2.04) worked properly.

@org-tekeli-borisp
Copy link

confirming grub 2.12 works here too! I had an EFI loader created a year ago with one of the previous gurb versions. Yesterday I made a snapshot and ran in this issue. Creating and installing new EFI loader with the 2.12 solved the problem.

@amotin
Copy link
Member

amotin commented Sep 10, 2024

I recently tried to boot old (10.3) FreeBSD from a pool restored from backup on newer version, and it failed despite all disabled features until I've redone it on the same old FreeBSD version. I wonder if we leak some incompatible changes despite disabled features.

@zapotah
Copy link

zapotah commented Sep 19, 2024

This seems to be fixed at least with debian bookworm-backports grub-efi 2.12 packages.

ptr1337 pushed a commit to CachyOS/zfs that referenced this issue Nov 14, 2024
GRUB is not able to detect ZFS pool if snaphsot of top level boot
pool is created. This issue is observed with GRUB versions up to
v2.06 if extensible_dataset feature is enabled on ZFS boot pool.

compatibility=grub2-2.06 would enable all read-only compatible
zpool features except extensible_dataset and other features that
depend on it.

The existing grub2 compatibility file is now renamed to grub2-2.12 to
reflect the appropriate grub2 version. grub2-2.12 lists all read-only
features that can be enabled on boot pool for grub2 with version 2.12
onwards.

A new symlink grub2 is created that currently points to the grub2-2.12
compatibility file.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes openzfs#13873
Closes openzfs#15261
Closes openzfs#15909
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: GRUB GRUB integration Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests