Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs-2.1.13 patchset #15268

Merged
merged 38 commits into from
Sep 27, 2023
Merged

Conversation

tonyhutter
Copy link
Contributor

Motivation and Context

Support 6.5 kernel, misc bugfixes

Description

Proposed patchset for zfs-2.1.13.

How Has This Been Tested?

Built on F38 with 6.5.3.0-rc1 kernel.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

jeremyvisser and others added 30 commits June 9, 2023 10:14
Two problems led to unexpected behaviour of the scriptlets:

1) Newer DKMS versions change the formatting of "dkms status":

   (old) zfs, 2.1.2, 5.14.10-300.fc35.x86_64, x86_64: installed
   (new) zfs/2.1.2, 5.14.10-300.fc35.x86_64, x86_64: installed

   Which broke a conditional determining whether to uninstall.

2) zfs_config.h not packaged properly, but was attempted to be read
   in the %preun scriptlet:

   CONFIG_H="/var/lib/dkms/zfs/2.1.2/*/*/zfs_config.h"

   Which broke the uninstallation of the module, which left behind a
   dangling symlink, which broke DKMS entirely with this error:

     Error! Could not locate dkms.conf file.
     File: /var/lib/dkms/zfs/2.1.1/source/dkms.conf does not exist.

This change attempts to simplify life by:

*  Avoiding parsing anything (less prone to future breakage)
*  Uses %posttrans instead of %post for module installation, because
   %post happens before %preun, while %posttrans happens afterwards
*  Unconditionally reinstall module on upgrade, which is less
   efficient but the trade-off is that it's more reliable

Alternative approaches could involve fixing the existing parsing bugs
or improving the logic, but this comes at the cost of complexity and
possible future bugs.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Jeremy Visser <[email protected]>
Closes openzfs#10463
Closes openzfs#13182
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#13259
Given:
  /sbin/zfs list filling/a-zvol<TAB> -o space,refratio
The rest of the cmdline gets vored by:
  /sbin/zfs list filling/a-zvolcannot open 'filling/a-zvol':
  operation not applicable to datasets of this type

With -x (fragment):
  + COMPREPLY=($(compgen -W "$(__zfs_match_snapshot)" -- "$cur"))
  +++ __zfs_match_snapshot
  +++ local base_dataset=filling/dziadtop-nowe-duchy
  +++ [[ filling/dziadtop-nowe-duchy != filling/dziadtop-nowe-duchy ]]
  +++ [[ filling/dziadtop-nowe-duchy != '' ]]
  +++ __zfs_list_datasets filling/dziadtop-nowe-duchy
  +++ /sbin/zfs list -H -o name -s name -t filesystem
                     -r filling/dziadtop-nowe-duchy
  +++ tail -n +2
  cannot open 'filling/dziadtop-nowe-duchy':
  operation not applicable to datasets of this type
  +++ echo filling/dziadtop-nowe-duchy
  +++ echo filling/dziadtop-nowe-duchy@
  ++ compgen -W 'filling/dziadtop-nowe-duchy

This properly completes with:
  $ /sbin/zfs list filling/a-zvol<TAB> -o space,refratio
  filling/a-zvol   filling/a-zvol@
  $ /sbin/zfs list filling/a-zvol<cursor> -o space,refratio

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Backport-of: 5ece420
Closes openzfs#12820
482da24 missed arc_buf_destroy() calls on log parse errors, possibly
leaking up to 128KB of memory per dataset during ZIL replay.

Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
l2arc_evict() performs the adjustment of the size of buffers to be
written on L2ARC unnecessarily. l2arc_write_size() is called right
before l2arc_evict() and performs those adjustments.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes openzfs#14828
l2arc_write_size() should return the write size after adjusting for trim
and overhead of the L2ARC log blocks. Also take into account the
allocated size of log blocks when deciding when to stop writing buffers
to L2ARC.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes openzfs#14939
While commit bcd5321 adjusts the write size based on the size of the log
block, this happens after comparing the unadjusted write size to the
evicted (target) size.

In this case l2ad_hand will exceed l2ad_evict and violate an assertion
at the end of l2arc_write_buffers().

Fix this by adding the max log block size to the allocated size of the
buffer to be committed before comparing the result to the target
size.

Also reset the l2arc_trim_ahead ZFS module variable when the adjusted
write size exceeds the size of the L2ARC device.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes openzfs#14936
Closes openzfs#14954
If this is not done, and the pool has an ashift other than the default
(at the moment 9) then the following happens:

1) vdev_alloc() assigns the ashift of the pool to L2ARC device, but
   upon export it is not stored anywhere
2) at the first import, vdev_open() sees an vdev_ashift() of 0 and
   assigns the logical_ashift, which is 9
3) reading the contents of L2ARC, including the header fails
4) L2ARC buffers are not restored in ARC.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes openzfs#14313 
Closes openzfs#14963
With the latest L2ARC fixes, 2 seconds is too long to wait for
quiescence of arcstats like l2_size. Shorten this interval to avoid
having the persistent L2ARC tests in ZTS prematurely terminated.

Signed-off-by: George Amanakis <[email protected]>
The previous code was checking zfs_is_namespace_prop() only for the
last property on the list.  If one was not "namespace", then remount
wasn't called.  To fix that move zfs_is_namespace_prop() inside the
loop and remount if at least one of properties was "namespace".

Reviewed-by: Umer Saleem <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15000
Unlike regular receive, raw receive require destination to have the
same block structure as the source.  In case of dnode reclaim this
triggers two special cases, requiring special handling:
 - If dn_nlevels == 1, we can change the ibs, but dnode_set_blksz()
should not dirty the data buffer if block size does not change, or
durign receive dbuf_dirty_lightweight() will trigger assertion.
 - If dn_nlevels > 1, we just can't change the ibs, dnode_set_blksz()
would fail and receive_object would trigger assertion, so we should
destroy and recreate the dnode from scratch.

Reviewed-by: Paul Dagnelie <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15039
We would see zed assert on one of our systems if we powered off a
slot.  Further examination showed zfs_retire_recv() was reporting
a GUID of 0, which in turn would return a NULL nvlist.  Add
in a check for a zero GUID.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#15084
For large JBODs the log message "zfs_iter_vdev: no match" can
account for the bulk of the log messages (over 70%).  Since this
message is purely informational and not that useful we remove it.

Reviewed-by: Olaf Faaland <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#15086
Closes openzfs#15094
Update the META file to reflect compatibility with the 6.4 kernel.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Lahfa <[email protected]>
Closes openzfs#15125
On Linux, sometimes, when ZFS goes to unmount an automounted snap,
it fails a VERIFY check on debug builds, because taskq_cancel_id
returned ENOENT after not finding the taskq it was trying to cancel.

This presumably happens when it already died for some reason; in this
case, we don't really mind it already being dead, since we're just
going to dispatch a new task to unmount it right after.

So we just ignore it if we get back ENOENT trying to cancel here,
retry a couple times if we get back the only other possible condition
(EBUSY), and log to dbgmsg if we got anything but ENOENT or success.

(We also add some locking around taskqid, to avoid one or two cases
of two instances of trying to cancel something at once.)

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Nguyen <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#11632
Closes openzfs#12670
Unfortunately, even after e79b680, I still, much more rarely,
tripped asserts when playing with many ctldir mounts at once.

Since this appears to happen if we dispatched twice too fast, just
ignore it. We don't actually need to do anything if someone already
started doing it for us.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#14462
When a kmem cache is exhausted and needs to be expanded a new
slab is allocated.  KM_SLEEP callers can block and wait for the
allocation, but KM_NOSLEEP callers were incorrectly allowed to
block as well.

Resolve this by attempting an emergency allocation as a best
effort.  This may fail but that's fine since any KM_NOSLEEP
consumer is required to handle an allocation failure.

Signed-off-by: Brian Behlendorf <[email protected]>
Reviewed-by: Adam Moss <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Reviewed-by: Richard Yao <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
This change was introduced in Linux commit
7ba150834b840f6f5cdd07ca69a4ccf39df59a66

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Coleman Kane <[email protected]>
Closes openzfs#15059
The disk_check_media_change() function was added which replaces
bdev_check_media_change.  This change was introduced in 6.5rc1
444aa2c58cb3b6cfe3b7cc7db6c294d73393a894 and the new function takes a
gendisk* as its argument, no longer a block_device*. Thus, bdev->bd_disk
is now used to pass the expected data.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Coleman Kane <[email protected]>
Closes openzfs#15060
Additionally, the .child element of ctl_table has been removed in 6.5.
This change adds a new test for the pre-6.5 register_sysctl_table()
function, and uses the old code in that case. If it isn't found, then
the parentage entries in the tables are removed, and the register_sysctl
call is provided the paths of "kernel/spl", "kernel/spl/kmem", and
"kernel/spl/kstat" directly, to populate each subdirectory over three
calls, as is the new API.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Coleman Kane <[email protected]>
Closes openzfs#15098
When disk_check_media_change() exists, then define
zfs_check_media_change() to simply call disk_check_media_change() on
the bd_disk member of its argument. Since disk_check_media_change()
is newer than when revalidate_disk was present in bops, we should
be able to safely do this via a macro, instead of recreating a new
implementation of the inline function that forces revalidation.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: Coleman Kane <[email protected]>
Closes openzfs#15101
Multiple changes to the blkdev API were introduced in Linux 6.5. This
includes passing (void* holder) to blkdev_put, adding a new
blk_holder_ops* arg to blkdev_get_by_path, adding a new blk_mode_t type
that replaces uses of fmode_t, and removing an argument from the release
handler on block_device_operations that we weren't using. The open
function definition has also changed to take gendisk* and blk_mode_t, so
update it accordingly, too.

Implement local wrappers for blkdev_get_by_path() and
vdev_blkdev_put() so that the in-line calls are cleaner, and place the
conditionally-compiled implementation details inside of both of these
local wrappers. Both calls are exclusively used within vdev_disk.c, at
this time.

Add blk_mode_is_open_write() to test FMODE_WRITE / BLK_OPEN_WRITE
The wrapper function is now used for testing using the appropriate
method for the kernel, whether the open mode is writable or not.

Emphasize fmode_t arg in zvol_release is not used

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Coleman Kane <[email protected]>
Closes openzfs#15099
The iov_iter->iov member is now iov_iter->__iov and must be accessed via
the accessor function iter_iov(). Create a wrapper that is conditionally
compiled to use the access method appropriate for the target kernel
version.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: Coleman Kane <[email protected]>
Closes openzfs#15100
An iov_iter_type() function to access the "type" member of the struct
iov_iter was added at one point. Move the conditional logic to decide
which method to use for accessing it into a macro and simplify the
zpl_uio_init code.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: Coleman Kane <[email protected]>
Closes openzfs#15100
This reverts commit b35374f as there
are error messages when loading the SPL module. Errors seemed to be tied
to duplicate a duplicate entry.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Brian Atkinson <[email protected]>
Closes openzfs#15134
Additionally, the .child element of ctl_table has been removed in 6.5.
This change adds a new test for the pre-6.5 register_sysctl_table()
function, and uses the old code in that case. If it isn't found, then
the parentage entries in the tables are removed, and the register_sysctl
call is provided the paths of "kernel/spl", "kernel/spl/kmem", and
"kernel/spl/kstat" directly, to populate each subdirectory over three
calls, as is the new API.

Reviewed-by: Brian Atkinson <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Coleman Kane <[email protected]>
Closes openzfs#15138
…e_read

The generic_file_splice_read function was removed in Linux 6.5 in favor
of filemap_splice_read. Add an autoconf test for filemap_splice_read and
use it if it is found as the handler for .splice_read in the
file_operations struct. Additionally, ITER_PIPE was removed in 6.5. This
change removes the ITER_* macros that OpenZFS doesn't use from being
tested in config/kernel-vfs-iov_iter.m4. The removal of ITER_PIPE was
causing the test to fail, which also affected the code responsible for
setting the .splice_read handler, above. That behavior caused run-time
panics on Linux 6.5.

Reviewed-by: Brian Atkinson <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Coleman Kane <[email protected]>
Closes openzfs#15155
Using the filemap_splice_read function for the splice_read handler was
leading to occasional data corruption under certain circumstances. Favor
using copy_splice_read instead, which does not demonstrate the same
erroneous behavior under the tested failure cases.

Reviewed-by: Brian Atkinson <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Coleman Kane <[email protected]>
Closes openzfs#15164
If we fail to create a proc entry in spl_proc_init() we may end up
calling unregister_sysctl_table() twice: one in the failure path of
spl_proc_init() and another time during spl_proc_fini().

Avoid the double call to unregister_sysctl_table() and while at it
refactor the code a bit to reduce code duplication.

This was accidentally introduced when the spl code was
updated for Linux 6.5 compatibility.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Signed-off-by: Andrea Righi <[email protected]>
Closes openzfs#15234 
Closes openzfs#15235
When register_sysctl_table() is unavailable we fail to properly
unregister sysctl entries under "kernel/spl".

This leads to errors like the following when spl is unloaded/reloaded,
making impossible to properly reload the spl module:

[  746.995704] sysctl duplicate entry: /kernel/spl/kmem/slab_kvmem_total

Fix by cleaning up all the sub-entries inside "kernel/spl" when the
spl module is unloaded.

Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: Andrea Righi <[email protected]>
Closes openzfs#15239
sdimitro and others added 2 commits September 12, 2023 09:22
Reviewed-by: Don Brady <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Serapheim Dimitropoulos <[email protected]>
Closes openzfs#15220
Update the META file to reflect compatibility with the 6.5
kernel.

Signed-off-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
@ipha
Copy link

ipha commented Sep 19, 2023

I built this against 6.5.3 on arch and got a crash on boot during import-cache

9月 18 16:59:15 ginger systemd[1]: Starting Import ZFS pools by cache file...
 9月 18 16:59:15 ginger (zpool)[871]: zfs-import-cache.service: Referenced but unset environment variable evaluates to an empty string: ZPOOL_IMPORT_OPTS
 9月 18 16:59:15 ginger kernel: detected buffer overflow in strcpy
 9月 18 16:59:15 ginger kernel: ------------[ cut here ]------------
 9月 18 16:59:15 ginger kernel: kernel BUG at lib/string_helpers.c:1031!
 9月 18 16:59:15 ginger kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
 9月 18 16:59:15 ginger kernel: CPU: 5 PID: 871 Comm: zpool Tainted: P           OE      6.5.3-arch1-1 #1 ed5b3b894d0aeb37298a77837232ca9b353cc27d
 9月 18 16:59:15 ginger kernel: Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F38f 08/09/2023
 9月 18 16:59:15 ginger kernel: RIP: 0010:fortify_panic+0x13/0x20
 9月 18 16:59:15 ginger kernel: Code: 41 5d e9 a0 8c 76 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 89 fe 48 c7 c7 40 1d 08 b5 e8 6d ac b1 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
 9月 18 16:59:15 ginger kernel: RSP: 0018:ffffad4ec0c1bc38 EFLAGS: 00010246
 9月 18 16:59:15 ginger kernel: RAX: 0000000000000022 RBX: ffff9d524d29e540 RCX: 0000000000000000
 9月 18 16:59:15 ginger kernel: RDX: 0000000000000000 RSI: ffff9d595eb616c0 RDI: ffff9d595eb616c0
 9月 18 16:59:15 ginger kernel: RBP: ffff9d525cd67400 R08: 0000000000000000 R09: ffffad4ec0c1bae0
 9月 18 16:59:15 ginger kernel: R10: 0000000000000003 R11: ffff9d597f32bc28 R12: 0000000000000060
 9月 18 16:59:15 ginger kernel: R13: ffffffffc0dc0835 R14: ffffffffc0d9ea08 R15: 0000000000000000
 9月 18 16:59:15 ginger kernel: FS:  00007f987cd45800(0000) GS:ffff9d595eb40000(0000) knlGS:0000000000000000
 9月 18 16:59:15 ginger kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 9月 18 16:59:15 ginger kernel: CR2: 00007f987cd02000 CR3: 0000000129310000 CR4: 0000000000350ee0
 9月 18 16:59:15 ginger kernel: Call Trace:
 9月 18 16:59:15 ginger kernel:  <TASK>
 9月 18 16:59:15 ginger kernel:  ? die+0x36/0x90
 9月 18 16:59:15 ginger kernel:  ? do_trap+0xda/0x100
 9月 18 16:59:15 ginger kernel:  ? fortify_panic+0x13/0x20
 9月 18 16:59:15 ginger kernel:  ? do_error_trap+0x6a/0x90
 9月 18 16:59:15 ginger kernel:  ? fortify_panic+0x13/0x20
 9月 18 16:59:15 ginger kernel:  ? exc_invalid_op+0x50/0x70
 9月 18 16:59:15 ginger kernel:  ? fortify_panic+0x13/0x20
 9月 18 16:59:15 ginger kernel:  ? asm_exc_invalid_op+0x1a/0x20
 9月 18 16:59:15 ginger kernel:  ? fortify_panic+0x13/0x20
 9月 18 16:59:15 ginger kernel:  ? fortify_panic+0x13/0x20
 9月 18 16:59:15 ginger kernel:  __zfs_dbgmsg+0x11a/0x120 [zfs c1136432d04db160a431e112952a09e3d85161f8]
 9月 18 16:59:15 ginger kernel:  __dprintf+0xf4/0x180 [zfs c1136432d04db160a431e112952a09e3d85161f8]
 9月 18 16:59:15 ginger kernel:  ? srso_return_thunk+0x5/0x10
 9月 18 16:59:15 ginger kernel:  spa_tryimport+0x3dc/0x430 [zfs c1136432d04db160a431e112952a09e3d85161f8]
 9月 18 16:59:15 ginger kernel:  zfs_ioc_pool_tryimport+0x79/0xd0 [zfs c1136432d04db160a431e112952a09e3d85161f8]
 9月 18 16:59:15 ginger kernel:  zfsdev_ioctl_common+0x86d/0x9a0 [zfs c1136432d04db160a431e112952a09e3d85161f8]
 9月 18 16:59:15 ginger kernel:  ? srso_return_thunk+0x5/0x10
 9月 18 16:59:15 ginger kernel:  ? __kmalloc_node+0xc3/0x150
 9月 18 16:59:15 ginger kernel:  ? srso_return_thunk+0x5/0x10
 9月 18 16:59:15 ginger kernel:  zfsdev_ioctl+0x53/0xe0 [zfs c1136432d04db160a431e112952a09e3d85161f8]
 9月 18 16:59:15 ginger kernel:  __x64_sys_ioctl+0x97/0xd0
 9月 18 16:59:15 ginger kernel:  do_syscall_64+0x60/0x90
 9月 18 16:59:15 ginger kernel:  ? handle_mm_fault+0x9e/0x350
 9月 18 16:59:15 ginger kernel:  ? srso_return_thunk+0x5/0x10
 9月 18 16:59:15 ginger kernel:  ? do_user_addr_fault+0x179/0x640
 9月 18 16:59:15 ginger kernel:  ? srso_return_thunk+0x5/0x10
 9月 18 16:59:15 ginger kernel:  ? exc_page_fault+0x7f/0x180
 9月 18 16:59:15 ginger kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
 9月 18 16:59:15 ginger kernel: RIP: 0033:0x7f987cf0c9df
 9月 18 16:59:15 ginger kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
 9月 18 16:59:15 ginger kernel: RSP: 002b:00007fff666ea0e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 9月 18 16:59:15 ginger kernel: RAX: ffffffffffffffda RBX: 00005627e6186610 RCX: 00007f987cf0c9df
 9月 18 16:59:15 ginger kernel: RDX: 00007fff666ea150 RSI: 0000000000005a06 RDI: 0000000000000003
 9月 18 16:59:15 ginger kernel: RBP: 00007fff666ed730 R08: 00000000ffffffff R09: 0000000000000000
 9月 18 16:59:15 ginger kernel: R10: 0000000000000022 R11: 0000000000000246 R12: 00007fff666ea150
 9月 18 16:59:15 ginger kernel: R13: 00005627e618e390 R14: 00005627e618e460 R15: b0ed72e83db09eec
 9月 18 16:59:15 ginger kernel:  </TASK>
 9月 18 16:59:15 ginger kernel: Modules linked in: hid_logitech_hidpp joydev btusb btrtl btbcm btintel btmtk mousedev hid_logitech_dj bluetooth ecdh_generic igb rfkill crc16 dca amdgpu intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd snd_hda_codec_realtek vfat amdxcp fat snd_hda_codec_generic drm_buddy snd_hda_codec_hdmi ledtrig_audio kvm snd_usb_audio gpu_sched snd_usbmidi_lib snd_hda_intel snd_intel_dspcfg irqbypass i2c_algo_bit snd_ump crct10dif_pclmul snd_intel_sdw_acpi snd_rawmidi crc32_pclmul polyval_clmulni drm_suballoc_helper snd_seq_device snd_hda_codec drm_ttm_helper polyval_generic gf128mul ttm snd_hda_core mc ghash_clmulni_intel sha512_ssse3 snd_hwdep drm_display_helper aesni_intel crypto_simd snd_pcm cryptd cec gigabyte_wmi wmi_bmof rapl snd_timer video sp5100_tco acpi_cpufreq snd pcspkr ccp k10temp i2c_piix4 soundcore wmi mac_hid zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) dm_multipath crypto_user fuse dm_mod loop ip_tables x_tables xxhash_generic xhci_pci
 9月 18 16:59:15 ginger kernel:  xhci_pci_renesas usbhid usb_storage nvme nvme_core nvme_common btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq
 9月 18 16:59:15 ginger kernel: ---[ end trace 0000000000000000 ]---
 9月 18 16:59:15 ginger kernel: RIP: 0010:fortify_panic+0x13/0x20
 9月 18 16:59:15 ginger kernel: Code: 41 5d e9 a0 8c 76 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 89 fe 48 c7 c7 40 1d 08 b5 e8 6d ac b1 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
 9月 18 16:59:15 ginger kernel: RSP: 0018:ffffad4ec0c1bc38 EFLAGS: 00010246
 9月 18 16:59:15 ginger kernel: RAX: 0000000000000022 RBX: ffff9d524d29e540 RCX: 0000000000000000
 9月 18 16:59:15 ginger kernel: RDX: 0000000000000000 RSI: ffff9d595eb616c0 RDI: ffff9d595eb616c0
 9月 18 16:59:15 ginger kernel: RBP: ffff9d525cd67400 R08: 0000000000000000 R09: ffffad4ec0c1bae0
 9月 18 16:59:15 ginger kernel: R10: 0000000000000003 R11: ffff9d597f32bc28 R12: 0000000000000060
 9月 18 16:59:15 ginger kernel: R13: ffffffffc0dc0835 R14: ffffffffc0d9ea08 R15: 0000000000000000
 9月 18 16:59:15 ginger kernel: FS:  00007f987cd45800(0000) GS:ffff9d595ed40000(0000) knlGS:0000000000000000
 9月 18 16:59:15 ginger kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 9月 18 16:59:15 ginger kernel: CR2: 000055f000dfa958 CR3: 0000000129310000 CR4: 0000000000350ee0
 9月 18 16:59:15 ginger systemd[1]: zfs-import-cache.service: Main process exited, code=killed, status=11/SEGV
 9月 18 16:59:15 ginger systemd[1]: zfs-import-cache.service: Failed with result 'signal'.
 9月 18 16:59:15 ginger systemd[1]: Failed to start Import ZFS pools by cache file.
 9月 18 16:59:15 ginger systemd[1]: Reached target ZFS pool import target.
 9月 18 16:59:15 ginger systemd[1]: Reached target ZFS startup target.
 9月 18 16:59:15 ginger systemd[1]: Starting Mount ZFS filesystems...
 9月 18 16:59:19 ginger systemd[1]: systemd-rfkill.service: Deactivated successfully.

@mabod
Copy link

mabod commented Sep 19, 2023

I built this against 6.5.3 on arch and got a crash on boot during import-cache

I got a similar crash today while testing with kernel 6.5.4. I do not have logs. The core dump was only on the screen before the journal started.

@tonyhutter
Copy link
Contributor Author

@ipha thanks for reporting this - I'm able to reproduce it on Fedora 38 as well.

 200.851685] ------------[ cut here ]------------
[  200.851952] kernel BUG at lib/string_helpers.c:1031!
[  200.852248] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  200.852550] CPU: 8 PID: 50601 Comm: zpool Tainted: P           OE      6.5.3-0.rc1.250.vanilla.fc38.x86_64 #1
[  200.853164] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-3.module+el8.8.0+16781+9f4724c2 04/01/2014
[  200.853830] RIP: 0010:fortify_panic+0x13/0x20
[  200.854096] Code: 41 5d c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 89 fe 48 c7 c7 f0 a5 94 9e e8 0d 3c 9a ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
[  200.855220] RSP: 0018:ffffb2320167bcc8 EFLAGS: 00010246
[  200.855546] RAX: 0000000000000022 RBX: ffff8a2a8a1f4b80 RCX: 0000000000000000
[  200.856065] RDX: 0000000000000000 RSI: ffff8a2bb7c21540 RDI: ffff8a2bb7c21540
[  200.856522] RBP: 0000000000000061 R08: 0000000000000000 R09: ffffb2320167bb70
[  200.856933] R10: 0000000000000003 R11: ffffffff9f345d28 R12: ffff8a2a8a0e6000
[  200.857345] R13: ffffffffc1339712 R14: ffffffffc12baa28 R15: 0000000000000000
[  200.857752] FS:  00007ffb0982c8c0(0000) GS:ffff8a2bb7c00000(0000) knlGS:0000000000000000
[  200.858212] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  200.858544] CR2: 0000000001a49000 CR3: 00000001e361a005 CR4: 0000000000770ee0
[  200.858953] PKRU: 55555554
[  200.859113] Call Trace:
[  200.859342]  <TASK>
[  200.859469]  ? die+0x36/0x90
[  200.859680]  ? do_trap+0xda/0x100
[  200.859880]  ? fortify_panic+0x13/0x20
[  200.860241]  ? do_error_trap+0x6a/0x90
[  200.860472]  ? fortify_panic+0x13/0x20
[  200.860741]  ? exc_invalid_op+0x50/0x70
[  200.860995]  ? fortify_panic+0x13/0x20
[  200.861212]  ? asm_exc_invalid_op+0x1a/0x20
[  200.861461]  ? fortify_panic+0x13/0x20
[  200.861678]  ? fortify_panic+0x13/0x20
[  200.861894]  __zfs_dbgmsg+0x103/0x110 [zfs]
[  200.862267]  __dprintf+0xf4/0x180 [zfs]
[  200.862599]  spa_tryimport+0x30d/0x470 [zfs]
[  200.862961]  zfs_ioc_pool_tryimport+0x79/0xd0 [zfs]
[  200.863363]  zfsdev_ioctl_common+0x663/0x7a0 [zfs]
(gdb) list *(spa_tryimport+0x30d)
0xe732d is in spa_tryimport (~/release/zfs/module/zfs/spa.c:6254).
6249			spa->spa_load_max_txg = policy.zlp_txg;
6250			spa->spa_extreme_rewind = B_TRUE;
6251			zfs_dbgmsg("spa_tryimport: importing %s, max_txg=%lld",
6252			    poolname, (longlong_t)policy.zlp_txg);
6253		} else {
6254			zfs_dbgmsg("spa_tryimport: importing %s", poolname);
6255		}
6256	
6257		if (nvlist_lookup_string(tryconfig, ZPOOL_CONFIG_CACHEFILE, &cachefile)

I'm looking into it further

@tonyhutter
Copy link
Contributor Author

Note that the crash happened while typing zpool import with no pool name. So searching for pools

@tonyhutter
Copy link
Contributor Author

tonyhutter commented Sep 20, 2023

Ok, this part of 9c8fabf fixes the crash:

diff --git a/module/os/linux/zfs/zfs_debug.c b/module/os/linux/zfs/zfs_debug.c
index 98c9923d5..595806373 100644
--- a/module/os/linux/zfs/zfs_debug.c
+++ b/module/os/linux/zfs/zfs_debug.c
@@ -30,7 +30,7 @@ typedef struct zfs_dbgmsg {
        procfs_list_node_t      zdm_node;
        uint64_t                zdm_timestamp;
        int                     zdm_size;
-       char                    zdm_msg[1]; /* variable length allocation */
+       char                    zdm_msg[]; /* variable length allocation */
 } zfs_dbgmsg_t;
 
 procfs_list_t zfs_dbgmsgs;
@@ -134,7 +134,7 @@ __set_error(const char *file, const char *func, int line, int err)
 void
 __zfs_dbgmsg(char *buf)
 {
-       int size = sizeof (zfs_dbgmsg_t) + strlen(buf);
+       int size = sizeof (zfs_dbgmsg_t) + strlen(buf) + 1;
        zfs_dbgmsg_t *zdm = kmem_zalloc(size, KM_SLEEP);
        zdm->zdm_size = size;
        zdm->zdm_timestamp = gethrestime_sec();

Let me see about pulling that patch in

The Linux 5.16.14 kernel's coccicheck caught this. The semantic
patch that caught it was:

./scripts/coccinelle/misc/flexible_array.cocci

However, unlike the cases where the GNU zero length array extension had
been used, coccicheck would not suggest patches for the older style
single member arrays. That was good because blindly changing them would
break size calculations in most cases.

Therefore, this required care to make sure that we did not break size
calculations. In the case of `indirect_split_t`, we use
`offsetof(indirect_split_t, is_child[is->is_children])` to calculate
size. This might be subtly wrong according to an old mailing list
thread:

https://inbox.sourceware.org/gcc-prs/[email protected]/T/

That is because the C99 specification should consider the flexible array
members to start at the end of a structure, but compilers prefer to put
padding at the end. A suggestion was made to allow compilers to allocate
padding after the VLA like compilers already did:

http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n983.htm

However, upon thinking about it, whether or not we allocate end of
structure padding does not matter, so using offsetof() to calculate the
size of the structure is fine, so long as we do not mix it with sizeof()
on structures with no array members.

In the case that we mix them and padding causes offsetof(struct_t,
vla_member[0]) to differ from sizeof(struct_t), we would be doing unsafe
operations if we underallocate via `offsetof()` and then overcopy via
sizeof().

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Richard Yao <[email protected]>
Closes openzfs#14372
@tonyhutter
Copy link
Contributor Author

9c8fabf is pulled in now, so the kernel bug should be gone.

@behlendorf
Copy link
Contributor

@tonyhutter you can also pull 9192ab7 to resolve the checkstyle warnings.

@mabod
Copy link

mabod commented Sep 21, 2023

I can not apply this patchset anymore. It always fails with the latest patch to the META file. I am applying this PR to the latest 2.1.12 tar ball: https://github.com/zfsonlinux/zfs/releases/download/zfs-2.1.12/zfs-2.1.12.tar.gz

patching file module/zfs/vdev_indirect.c
patching file module/zstd/Makefile.in
patching file META
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file META.rej
╰─# cat META 
Meta:          1
Name:          zfs
Branch:        1.0
Version:       2.1.12
Release:      1
Release-Tags:  relext
License:       CDDL
Author:        OpenZFS
Linux-Maximum: 6.5
Linux-Minimum: 3.10
╰─# cat META.rej 
--- META
+++ META
@@ -1,7 +1,7 @@
 Meta:          1
 Name:          zfs
 Branch:        1.0
-Version:       2.1.12
+Version:       2.1.13
 Release:       1
 Release-Tags:  relext
 License:       CDDL

@tonyhutter
Copy link
Contributor Author

@mabod I'm not really sure how you're patching, but if you want to test out this patchset I'd recommend you check it out with git:

git clone https://github.com/tonyhutter/zfs.git --branch zfs-2.1.13-hutter

META file and changelog updated.

Signed-off-by: Tony Hutter <[email protected]>
@mabod
Copy link

mabod commented Sep 21, 2023

I'm not really sure how you're patching

I pulled this PR as a patch like this:

wget https://github.com/openzfs/zfs/pull/15268.patch

and applied it as part of the PKGBUILD with patch -p1 ...

I will do it now like you suggest

@seanjnkns
Copy link

seanjnkns commented Sep 27, 2023

Testing this on Fedora 39 and getting the following, kernel 6.5.5-300.fc39.x86_64. If it runs the same/similar test later in the loop, it's not crashing:

[root@fedora zfs]# sudo ./scripts/zloop.sh
Setting core file pattern...
core dump directory (/var/tmp/zloop) does not exist, creating it.
09/27 10:09:30 /root/zfs/bin/ztest -G -VVVVV -K raidz -m 2 -r 0 -D 0 -S 0 -R 2 -v 3 -a 12 -C special=random -s 512m -f /var/tmp/zloop-run
*** ztest crash found - moving logs to /var/tmp/zloop/zloop-230927-101340

warning: Can't open file /tmp/ztest.data.6xckxB (deleted) during file-backed mapping note processing
Missing separate debuginfo for the main executable file.
The debuginfo package for this file is probably broken.

warning: Section `.reg-xstate/25552' in core file too small.

warning: Section `.reg-xstate/25552' in core file too small.
*** core @ /var/tmp/zloop/zloop-230927-101340/core.24995:
continuing...

Coredump available at https://drive.google.com/drive/folders/1l5fyl4_xefHxo0trJqBLyNMx6lU3EKsu?usp=sharing

EDIT: Correction, with enough iterations it will generate the same core dump again, but only seeing it on the raidz ones, not on the draid ones.

@tonyhutter
Copy link
Contributor Author

@seanjnkns thanks for reporting this. I should mention that Fedora 39 is set to be released Oct 17th, and hasn't had any testing against ZFS yet. zloop also is not 100% stable either. We will definitely keep an eye out for zloop failures when we start Fedora 39 testing though.

@seanjnkns
Copy link

seanjnkns commented Sep 27, 2023

@seanjnkns thanks for reporting this. I should mention that Fedora 39 is set to be released Oct 17th, and hasn't had any testing against ZFS yet. zloop also is not 100% stable either. We will definitely keep an eye out for zloop failures when we start Fedora 39 testing though.

@tonyhutter - yeah, I'm aware Fedora 39 isn't released yet, just knew that they're including the 6.5 kernel by default there and figured I could see if we were going to have issues once that's released. Also, Fedora 38 just officially released the 6.5.x kernel so I'll be testing there as well and can provide any feedback. My main setup is using Fedora 38, hence why I want to test both environments. ATM I'm running the test suite on Fedora 39 which I'll provide when it completes, then switch to the same series of testing on Fedora 38.

@tonyhutter tonyhutter merged commit eb62221 into openzfs:zfs-2.1-release Sep 27, 2023
4 of 5 checks passed
@darkbasic
Copy link

Would it be possible to release a new rc for 2.2 with support for 6.5 as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.