Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs-2.1.15 patchset #15897

Merged
merged 45 commits into from
Feb 26, 2024
Merged

Conversation

tonyhutter
Copy link
Contributor

Motivation and Context

Proposed patch set for zfs-2.1.15

Description

Kernel compatibility and a few fixes and minor features.

How Has This Been Tested?

Buildbot will test

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

AllKind and others added 30 commits December 5, 2023 11:04
Alien does not honour the %posttrans hook.
So move the dkms uninstall/install scripts to the
 %pre/%post hooks in case of package install/upgrade.
In case of package removal, handle that in %preun.
Add removal of all old dkms modules.
Add checking for broken 'dkms status'. Handle that as
good as possible and warn the user about it.
Also add more verbose messages about what we are doing.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Mart Frauenlob <[email protected]>
Closes openzfs#15415
(cherry picked from commit 9ce567c)
If all zfs dkms modules have been removed, a shell-init error message
may appear, because /var/lib/dkms/zfs does no longer exist.
Resolve this by leaving the directory earlier on.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Mart Frauenlob <[email protected]>
Closes openzfs#15576
Commit 8af1104 does not actually store the ashift of cache devices in
their label. However, in order to facilitate reporting the ashift
through zdb, we enable this in the present commit. We also document
how the retrieval of the ashift is done.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
Closes openzfs#15331
The io_uring test fails on CentOS 9 with the following fio error.
Disable the test for the benefit of the CI until this can be fully
investigated.  This basic test passes as expected on newer kernels.

Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#15636
Recognize when the host part of a sharenfs attribute is an ipv6
Literal and pass that through without modification.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Felix Dörre <[email protected]>
Closes: openzfs#11171
Closes openzfs#11939
Closes: openzfs#1894
We are finding that as customers get larger and faster machines
(hundreds of cores, large NVMe-backed pools) they keep hitting
relatively low performance ceilings. Our profiling work almost always
finds that they're running into bottlenecks on the SPA IO taskqs.
Unfortunately there's often little we can advise at that point, because
there's very few ways to change behaviour without patching.

This commit adds two load-time parameters `zio_taskq_read` and
`zio_taskq_write` that can configure the READ and WRITE IO taskqs
directly.

This achieves two goals: it gives operators (and those that support
them) a way to tune things without requiring a custom build of OpenZFS,
which is often not possible, and it lets us easily try different config
variations in a variety of environments to inform the development of
better defaults for these kind of systems.

Because tuning the IO taskqs really requires a fairly deep understanding
of how IO in ZFS works, and generally isn't needed without a pretty
serious workload and an ability to identify bottlenecks, only minimal
documentation is provided. Its expected that anyone using this is going
to have the source code there as well.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
This patch backports the change from openzfs#15711 (to ZFS 2.2.x) because
some OS vendors have back-ported this change from Linux 6.2+ to their
older kernels.

The original patch addressed these issues:

1. Linux 6.2+ on arm64 has exported them with `EXPORT_SYMBOL_GPL`, so
   license compatibility must be checked before use.
2. On both arm and arm64, the definitions of these symbols are guarded
   by `CONFIG_KERNEL_MODE_NEON`, but their declarations are still
   present. Checking in configuration phase only leads to MODPOST
   errors (undefined references).

Signed-off-by: Keith Gable <[email protected]>
Co-authored-by: Shengqi Chen <[email protected]>
sbuf_cpy() resets the sbuf state, which is wrong for sbufs allocated by
sbuf_new_for_sysctl().  In particular, this code triggers an assertion
failure in sbuf_clear().

Simplify by just using sysctl_handle_string() for both reading and
setting the tunable.

Fixes: 6930ecb ("spa: make read/write queues configurable")
Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reported-by: Peter Holm <[email protected]>
Signed-off-by: Mark Johnston <[email protected]>
Closes openzfs#15719
For FreeBSD sysctls, we don't want the extra newline, since the
sysctl(8) utility will format strings appropriately.

Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reported-by: Peter Holm <[email protected]>
Signed-off-by: Mark Johnston <[email protected]>
Closes openzfs#15719
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Mateusz Guzik <[email protected]>
Closes openzfs#15036
This gets around UBSAN errors when using arrays at the end of
structs.  It converts some zero-length arrays to variable length
arrays and disables UBSAN checking on certain modules.

It is based off of the patch from openzfs#15460.

Reviewed-by: Brian Behlendorf <[email protected]>
Tested-by: Thomas Lamprecht <[email protected]>
Co-authored-by: Thomas Lamprecht <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Issue openzfs#15145
Closes openzfs#15510
In Linux commit 13bc24457850583a2e7203ded05b7209ab4bc5ef, direct access
to the i_ctime member of struct inode was removed. The new approach is
to use accessor methods that exclusively handle passing the timestamp
around by value. This change adds new tests for each of these functions
and introduces zpl_* equivalents in include/os/linux/zfs/sys/zpl.h. In
where the inode_get/set_ctime*() functions exist, these zpl_* calls will
be mapped to the new functions. On older kernels, these macros just wrap
direct-access calls. The code that operated on an address of ip->i_ctime
to call ZFS_TIME_DECODE() now will take a local copy using
zpl_inode_get_ctime(), and then pass the address of the local copy when
performing the ZFS_TIME_DECODE() call, in all cases, rather than
directly accessing the member.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Coleman Kane <[email protected]>
Closes openzfs#15263
Closes openzfs#15257
(cherry picked from commit fe9d409)
…t arg2

In commit 0d72b92883c651a11059d93335f33d65c6eb653b, a new u32 argument
for the request_mask was added to generic_fillattr. This is the same
request_mask for statx that's present in the most recent API implemented
by zpl_getattr_impl. This commit conditionally adds it to the
zpl_generic_fillattr(...) macro, as well as the zfs_getattr_fast(...)
implementation, when configure determines it's present in the kernel's
generic_fillattr(...).

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Coleman Kane <[email protected]>
Closes openzfs#15263
(cherry picked from commit 21875dd)
…kdev()

In Linux commit 560e20e4bf6484a0c12f9f3c7a1aa55056948e1e, the
fsync_bdev() function was removed in favor of sync_blockdev() to do
(roughly) the same thing, given the same input. This change
conditionally attempts to call sync_blockdev() if fsync_bdev() isn't
discovered during configure.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Coleman Kane <[email protected]>
Closes openzfs#15263
(cherry picked from commit 3f67e01)
With Linux v6.6.x and clang 16, a configure step fails on a warning that
later results in an error while building, due to 'ts' being
uninitialized. Add a trivial initialization to silence the warning.

Signed-off-by: Jaron Kent-Dobias <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
(cherry picked from commit d813aa8)
6.7 changed the names of the time members in struct inode, so we can't
assign back to it because we don't know its name. In practice this
doesn't matter though - if we're missing current_time(), then we must be
on <4.9, and we know our fallback will need to return timespec.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://github.com/sponsors/robn
(cherry picked from commit b3626f0)
6.6 made i_ctime inaccessible; 6.7 has done the same for i_atime and
i_mtime. This extends the method used for ctime in b37f293 to atime
and mtime as well.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://github.com/sponsors/robn
(cherry picked from commit 3c13601)
In 6.7 the superblock shrinker member s_shrink has changed from being an
embedded struct to a pointer. Detect this, and don't take a reference if
it already is one.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://github.com/sponsors/robn
(cherry picked from commit 18a9185)
6.7 changes the shrinker API such that shrinkers must be allocated
dynamically by the kernel. To accomodate this, this commit reworks
spl_register_shrinker() to do something similar against earlier kernels.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://github.com/sponsors/robn
(cherry picked from commit 03b8409)
On some systems we already have blkdev_get_by_path() with 4 args
but still the old FMODE_EXCL and not BLK_OPEN_EXCL defined.
The vdev_bdev_mode() function was added to handle this case
but there was no generic way to specify exclusive access.

Reviewed-by: Brian Atkinson <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#15692
(cherry picked from commit d530d5d)
In db4fc55 I messed up and changed this bit of code to set the inode
atime to an uninitialised value, when actually it was just supposed to
loading the atime from the inode to be stored in the SA. This changes it
to what it should have been.

Ensure times change by the right amount Previously, we only checked
if the times changed at all, which missed a bug where the atime was
being set to an undefined value.

Now ensure the times change by two seconds (or thereabouts), ensuring
we catch cases where we set the time to something bonkers

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#15762
Closes openzfs#15773
(cherry picked from commit 2ecc2df)
The kernel is now being compiled with -Wmissing-prototypes. Most of our
test stub functions had no prototype, and failed to compile. Since they
don't need to be visible anywhere else, just make them all static.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#15805
(cherry picked from commit 64afc4e)
blkdev_get_by_path() and blkdev_put() have been replaced by
bdev_open_by_path() and bdev_release(), which return a "handle" object
with the bdev object itself inside.

This adds detection for the new functions, and macros to handle the old
and new forms consistently.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#15805
(cherry picked from commit ce782d0)
Linux has removed strlcpy in favour of strscpy. This implements a
fallback implementation of strlcpy for this case.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#15805
(cherry picked from commit 7466e09)
MAX_ORDER has been renamed to MAX_PAGE_ORDER. Rather than just
redefining it, instead define our own name and set it consistently from
the start.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#15805
(cherry picked from commit 09e6724)
Have libzfs call a special `zfs_prepare_disk` script before a disk is
included into the pool.  The user can edit this script to add things
like a disk firmware update or a disk health check.  Use of the script
is totally optional. See the zfs_prepare_disk manpage for full details.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#15243
There have been rare cases where the VDEV_ENC_SYSFS_PATH value that zed
gets passed is stale.  To mitigate this, dynamically check the sysfs
path at the time of zed event processing, and use the dynamic value if
possible.  Note that there will be other times when we can not
dynamically detect the sysfs path (like if a disk disappears) and have
to rely on the old value for things like turning on the fault LED.  That
is to say, we can't just blindly use the dynamic path in every case.

Also:
	- Add enclosure sysfs entry when running 'zpool add'
	- Fix 'slot' and 'enc' zpool.d scripts for nvme

Reviewed-by: Don Brady <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#15462
Many tests are failing on AlmaLinux 9 because ZTS could not destroy the
pool in cleanup.  This was due to $PWD being set to '.' instead of the
expected full path.  This patch sets $PWD to the full path.

Signed-off-by: Tony Hutter <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Add a test for the dirty dnode SEEK_HOLE/SEEK_DATA bug described in
openzfs#15526

The bug was fixed in openzfs#15571 and
was backported to 2.2.2 and 2.1.14.  This test case is just to
make sure it does not come back.

seekflood.c originally written by Rob Norris.

Reviewed-by: Graham Perrin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#15608
mfoliveira and others added 14 commits February 13, 2024 14:22
Replace ENCLO_US_RE with ENCLO_SU_RE in the name of the variable.

Note this changes the user-visible string in zed.rc, thus might
break current users with the wrong string, but it's ~2 months
since zfs-2.2.0 tag is out, thus should not be widespread yet.

Mechanical change:

    $ grep -rl ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT
    cmd/zed/zed.d/zed.rc
    cmd/zed/zed.d/statechange-slot_off.sh

    $ sed -i 's/ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT/<linebreak>
                ZED_POWER_OFF_ENCLOSURE_SLOT_ON_FAULT/g' \
      cmd/zed/zed.d/zed.rc \
      cmd/zed/zed.d/statechange-slot_off.sh

    $ grep -rl ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT
    $

Fixes 11fbcac
("zed: Add zedlet to power off slot when drive is faulted")

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Closes openzfs#15651
Add `zpool` flags to control the slot power to drives.  This assumes
your SAS or NVMe enclosure supports slot power control via sysfs.

The new `--power` flag is added to `zpool offline|online|clear`:

    zpool offline --power <pool> <device>    Turn off device slot power
    zpool online --power <pool> <device>     Turn on device slot power
    zpool clear --power <pool> [device]      Turn on device slot power

If the ZPOOL_AUTO_POWER_ON_SLOT env var is set, then the '--power'
option is automatically implied for `zpool online` and `zpool clear`
and does not need to be passed.

zpool status also gets a --power option to print the slot power status.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Mart Frauenlob <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#15662
When very large pools are present, it can be laborious to find
reasons for why a pool is degraded and/or where an unhealthy vdev
is. This option filters out vdevs that are ONLINE and with no errors
to make it easier to see where the issues are. Root and parents of
unhealthy vdevs will always be printed.

Testing:
ZFS errors and drive failures for multiple vdevs were simulated with
zinject.

Sample vdev listings with '-e' option
- All vdevs healthy
    NAME        STATE     READ WRITE CKSUM
    iron5       ONLINE       0     0     0

- ZFS errors
    NAME        STATE     READ WRITE CKSUM
    iron5       ONLINE       0     0     0
      raidz2-5  ONLINE       1     0     0
        L23     ONLINE       1     0     0
        L24     ONLINE       1     0     0
        L37     ONLINE       1     0     0

- Vdev faulted
    NAME        STATE     READ WRITE CKSUM
    iron5       DEGRADED     0     0     0
      raidz2-6  DEGRADED     0     0     0
        L67     FAULTED      0     0     0  too many errors

- Vdev faults and data errors
    NAME        STATE     READ WRITE CKSUM
    iron5       DEGRADED     0     0     0
      raidz2-1  DEGRADED     0     0     0
        L2      FAULTED      0     0     0  too many errors
      raidz2-5  ONLINE       1     0     0
        L23     ONLINE       1     0     0
        L24     ONLINE       1     0     0
        L37     ONLINE       1     0     0
      raidz2-6  DEGRADED     0     0     0
        L67     FAULTED      0     0     0  too many errors

- Vdev missing
    NAME        STATE     READ WRITE CKSUM
    iron5       DEGRADED     0     0     0
      raidz2-6  DEGRADED     0     0     0
        L67     UNAVAIL      3     1     0

- Slow devices when -s provided with -e
    NAME        STATE     READ WRITE CKSUM  SLOW
    iron5       DEGRADED     0     0     0     -
      raidz2-5  DEGRADED     0     0     0     -
        L10     FAULTED      0     0     0     0  external device fault
        L51     ONLINE       0     0     0    14

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Cameron Harr <[email protected]>
Closes openzfs#15769
CVE-2020-24370 is a security vulnerability in lua. Although the CVE
description in CVE-2020-24370 said that this CVE only affected lua
5.4.0, according to lua this CVE actually existed since lua 5.2. The
root cause of this CVE is the negation overflow that occurs when you
try to take the negative of 0x80000000. Thus, this CVE also exists in
openzfs. Try to backport the fix to the lua in openzfs since the
original fix is for 5.4 and several functions have been changed.

GHSA-gfr4-c37g-mm3v
https://nvd.nist.gov/vuln/detail/CVE-2020-24370
https://www.lua.org/bugs.html#5.4.0-11
lua/lua@a585eae

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: ChenHao Lu <[email protected]>
Closes openzfs#15847
The zfs_load-key tests were failing on F39 due to their use of the
deprecated ssl.wrap_socket function.  This commit updates the test to
instead use ssl.SSLContext() as described in
https://stackoverflow.com/a/65194957.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#15534
Closes openzfs#15550
Update ABI files from latest CI run results (for zfs-2.1.15)

Signed-off-by: Tony Hutter <[email protected]>
- Mark some parameters to zpool_power*() as unused.
- Add a stub zpool_disk_wait().

Fixes: a9520e6 ("zpool: Add slot power control, print power status")

Signed-off-by: Mark Johnston <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
This thoroughly destroys logapi and races to the log files horribly

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: John Kennedy <[email protected]>
Reviewed-by: Ryan Moeller <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#13259
Update zfs_share_concurrent_shares test case to wait a few seconds
and recheck that the filesystem isn't shared.  The intent here is
determine the nature of the error and if it may be a race.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by:  Umer Saleem <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#15379
Update the META file to reflect compatibility with the 6.6 kernel.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Umer Saleem <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#15520
Update the META file to reflect compatibility with the 6.7 kernel.

Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#15833
* Add missing dgettext calls.
* Avoid unnecessary line wraps.
* Factor out duplicated parsable check.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Ryan Moeller <[email protected]>
Closes openzfs#12967
Squelch false positives reported by GCC 12 with UBSan.

Reviewed-by: Richard Yao <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: szubersk <[email protected]>
Closes openzfs#14150
First the function `memset(&key, 0, ...)` but
any call to "goto error;" would call zio_crypt_key_destroy(key) which
calls `rw_destroy()`. The `rw_init()` is moved up to be right after the
memset. This way the rwlock can be released.

The ctx does allocate memory, but that is handled by the memset to 0
and icp skips NULL ptrs.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Richard Yao <[email protected]>
Signed-off-by: Jorgen Lundman <[email protected]>
Closes openzfs#13976
META file and changelog updated.

Signed-off-by: Tony Hutter <[email protected]>
@tonyhutter tonyhutter merged commit fb6d532 into openzfs:zfs-2.1-release Feb 26, 2024
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.