Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-networkd persists through switchroot and breaks lots of stuff #68

Closed
wtogami opened this issue Jun 15, 2023 · 8 comments
Closed

Comments

@wtogami
Copy link

wtogami commented Jun 15, 2023

Fedora 38
systemd-networkd-253.5-1.fc38.x86_64
dracut-059-3.fc38.x86_64
dracut-sshd-0.6.5-1
NetworkManager-1.42.6-1.fc38.x86_64

If you use /etc/systemd/network/20-wired.network as suggested in the documentation it can create serious problems.

DHCP conflict, IP address doubling

  1. systemd-networkd does DHCP on the ethernet interface and allow you to ssh in to type the LUKS passphrase. Unfortunately it does DHCP with a different DUID (and possibly other parameters, not sure) where the DHCP server can too easily give it a different IP address despite the MAC address being the same. Reproduced against two different DHCP servers, Google Fiber's proprietary gateway and Open Source dnsmasq-2.89-5.fc38.x86_64. At this point I see one IPv4 and one IPv6 address on the interface during initramfs (see ip addr).
  2. But it gets worse. The ethernet interface remains active with that address into the switchroot. The OS will then likely DHCP again with the original (pre-dracut-sshd rootfs DUID). It can then be be assigned an IP address different from those assigned during initramfs because the DUID is different and/or the previous IP address is seen as active on the local network. At this point I see two IPv4 addresses and two IPv6 addresses on the interface.
  3. But it gets worse. The IPv4 and IPv6 addresses assigned during initramfs remains on the interface alongside the rootfs IPv4 and IPv6 addresses. But at a random time hours later, the initramfs IPv6 address disappears because there's no daemon managing its lease (?). Not sure why but it happens. This can too easily confuse other software and break stuff.

Other Conflicts

Because systemd-networkd did not cleanly teardown the interface prior to switchroot it creates other problems. For example my buddy has been using dracut-sshd for a few years. He had to add hacky scripts post-boot to force the persisting interface to reset before his NetworkManager adds it to a virtual bridge device, and that virtual bridge device is what runs DHCP on the LAN.

It's possible systemd-networkd could be configured to do the same as these more complicated NetworkManager scenarios. But NetworkManager is the default on many distros and we shouldn't force people to stop using it. dracut-sshd can be become nicely compatible with NetworkManager.

networkd could be mitigated

You could add a hook to tear down the ethernet interface prior to switchroot. I am not sure where or how. I confirmed manually in the rd.break shell that networkctl down DEVICENAME deactivated the interface in such a way that rootfs NetworkManager later had no trouble.

This would be an improvement but it doesn't escape the fact that it is a different DHCP client that can too easily get a different address.

NetworkManager for dracut-sshd

See #69 for the simplest way to activate NetworkManger during initramfs for dracut-sshd. It even nicely tears down the network prior to switchroot.

wtogami added a commit to wtogami/dracut-sshd that referenced this issue Jun 15, 2023
  99sshd-auto-networkmanager adjusts nm-initrd.service to run for dracut-sshd.
- If config is lacking auto DHCP ethernet in the same manner as rootfs NetworkManager.
- Clean network teardown prior to switchroot avoids conflicts and gives OS full control.
- Settings could be overriden by copying ifcfg or nmconnection settings into the initrd.

Fixes: Issues gsauthof#63 gsauthof#68
Signed-off-by: Warren Togami <[email protected]>
@gsauthof
Copy link
Owner

Well, if systemd-networkd doesn't work for you, you prefer NetworkManager or your distribution simply doesn't provide it you are free to use something else.

Yes, the README recommends it the Install Section, which is written like a quickstart tutorial, like this:

However,
the author of this README strongly recommends to use Networkd
instead of NetworkManager on servers and server-like systems.

Even after mentioning alternatives and referring to a the later Network Section for more details:

Alternatively, early boot network connectivity can be configured
by other means (i.e. kernel parameters, see below).

So it's really a recommendation with a restriction and nuance.

IOW, dracut-sshd really is agnostic to how network connectivity is provided during early boot.


If you are suspecting bugs when networkd runs in early boot via the networkd dracut module you should report them to the networkd people.

If you think that networkd shouldn't keep the interfaces active during switch-root, should do DHCP differently, etc. then report some issues in the networkd issue tracker.

Or if you don't care about networkd, just ignore it and use NetworkManager instead, e.g. by adding rd.neednet=1 ip=dhcp to the kernel command line etc. - as described in the Network Section of the README - or as documented in many other places.


FWIW, with IPv6, a common way to get an IP-address is SLAAC. There are variants of how the address is generated, e.g. networkd uses eui-64 by default while NetworkManager defaults to some unstable/privacy address generation scheme. So if you are using networkd during early boot and NM later this should explain why you get multiple IPv6 addresses. However, with IPv6, it's pretty normal to use multiple stable and unstable addresses all the time, the unstable ones for privacy reasons (cf. privacy extensions). If you need a stable and deterministic IPv6 address (e.g. because you are running server services), the standard thing to do is to configure eui-64 (if it isn't the default) and/or configure (additional) static IPv6 addresses.


FWIW, I'm running dracut-sshd in combination with networkd (for early and late boot) on multiple servers/server like system, with/without dhcp and with IPv6 on all of them, and I didn't run into any issues similar to what you are describing.

@wtogami
Copy link
Author

wtogami commented Jun 16, 2023

I'm only telling my story and that of my friend. In my case it was subtly broken in these ways where in the past I gave up on using dracut-sshd until I had time to look deeper now. In the case of my friend he's been using it for years with ugly hacks that he applied to deal with '!' shadow, the DHCP conflict and lack of teardown prior to switchroot. It works nicely with the default NetworkManager of EL8, EL9, and Fedora 38 with the changes in #69. No thinking is required for most users unless they need to copy specific network config into the initramfs.

Unfortunately ip=dhcp has the drawback of not cleaning up after itself before switchroot.

Yes, likely most of the same is possible if you switch rootfs to networkd. We shouldn't expect that of everyone though. I hope that you would accept these changes as they don't affect other users?

@wtogami
Copy link
Author

wtogami commented Jun 16, 2023

If you are suspecting bugs when networkd runs in early boot via the networkd dracut module you should report them to the networkd people.

Any change to networkd might take years to get in the distros people are using today. The enterprise distros supported until 2029 or 2032 might never pick up that fix.

The initramfs and rootfs already have working NetworkManager. Please merge so those users can use it if they want. It does not affect the networkd users who don't want it.

@gsauthof
Copy link
Owner

Unfortunately ip=dhcp has the drawback of not cleaning up after itself before switchroot.

As I've commented in the pull request, I find it wild that network manager even needs the interfaces to be down.

If that is really the case and can't be fixed, perhaps you can convince the network manager module owner to add something like ip=dhcp-and-then-tear-down ...

Yes, likely most of the same is possible if you switch rootfs to networkd.

I really don't want to force people using networkd for everything.
As I've said, dacut-sshd is network-service agnostic.
But if you have decided to use NetworkManager and run into issues, and they are real issues, at the end of the day, those have to be solved outside of dracut-sshd.

I hope that you would accept these changes as they don't affect other users?

Even if that would be true, it isn't the only criterion for accepting pull-requests.

Any extra code increases my maintenance burden and makes the project harder to review.
This is true for all projects, but more so for projects with a focus on security.


FWIW, I just configured a Fedora 38 system with networkd for early boot and NetworkManager and I didn't run into any problems.

So it seems that your proposed change only helps for some specific NetworkManager setups.

@psgreco
Copy link

psgreco commented Jun 17, 2023

I understand that we all wanna change the world, but and make all the software perfect, but at the end of the day, we just wanna make it work, and cases like this make it work for more people. I can offer myself to review parts of the code that are a burden to you in the hopes of making this package work OOTB in more scenarios, because it really is an awesome tool

psgreco pushed a commit to psgreco/dracut-sshd that referenced this issue Jun 17, 2023
  99sshd-auto-networkmanager adjusts nm-initrd.service to run for dracut-sshd.
- If config is lacking auto DHCP ethernet in the same manner as rootfs NetworkManager.
- Clean network teardown prior to switchroot avoids conflicts and gives OS full control.
- Settings could be overriden by copying ifcfg or nmconnection settings into the initrd.

Fixes: Issues gsauthof#63 gsauthof#68
Signed-off-by: Warren Togami <[email protected]>
psgreco pushed a commit to psgreco/dracut-sshd that referenced this issue Jun 17, 2023
  99sshd-auto-networkmanager adjusts nm-initrd.service to run for dracut-sshd.
- If config is lacking auto DHCP ethernet in the same manner as rootfs NetworkManager.
- Clean network teardown prior to switchroot avoids conflicts and gives OS full control.
- Settings could be overriden by copying ifcfg or nmconnection settings into the initrd.

Fixes: Issues gsauthof#63 gsauthof#68
Signed-off-by: Warren Togami <[email protected]>
@wtogami
Copy link
Author

wtogami commented Jun 17, 2023

It needs to be stated. dracut-sshd is fine work. Nice job!

I'm sorry we have a stark difference of opinion on some matters. It was frustrating for me to see dismissed tickets of commonly encountered issues that could have an easily automatable fix. You accused me of not arguing honestly. Honestly, we're shocked by the responses here then locking #69 to prevent responses makes it hard to want to collaborate.

But I am not giving up on this project because IT IS FINE WORK. Arguing further about personal slights is not going to help our interpersonal relationship so I'm sticking to only technical responses below.

We have been close to Fedora and EL development for decades. We know how exceedingly difficult it is to convince them to include changes in the old enterprise distros. Meanwhile I was surprised to find how easy it was to use NetworkManger's config with minor changes applied by dracut to get exactly the desired behavior. More surprising is such minimal config changes worked on RHEL8, RHEL9 and Fedora 38. It is wonderful this is possible without distro changes.

The normal way in which we'd integrate such a package in Fedora is to make things as automatic as possible, that is unless something is potentially dangerous then requiring manual configuration would be prudent. The changes I proposed are 1) safe 2) do not affect other users who don't need it 3) would make things work more automatically with Fedora defaults making distro inclusion easier later.

It is valid to declare distro-specific configs to not belong in upstream dracut-sshd. Fine. It is normal for distro packages to include integration and configuration customizations.

As I've commented in the pull request, I find it wild that network manager even needs the interfaces to be down.

The expectation is for initramfs to clean up after itself then get entirely out of the way. rootfs expects particular state of things when it starts.

If that is really the case and can't be fixed, perhaps you can convince the network manager module owner to add something like ip=dhcp-and-then-tear-down

NetworkManager already does that right now on Fedora 38.

This approach does more than ip=dhcp. It goes into automatic mode (DHCP) if blank/missing, or whatever you tell it to do in provided config files, ifcfg compat files, or nmconnection files.

I know that it's by design. But then your special bridge setup doesn't work anymore with NetworkManager, no?

NM can do the desired rootfs bridge configuration in two possible ways.

  1. No config in initramfs allows NM to DHCP only for LUKS unlock. Cleanup after itself then let rootfs decide what to do.
  2. The bridge and ethernet bridge adding config files could be copied into the initramfs. It could be done but there's no benefit to doing this during initramfs.

My personal opinion is the goal of dracut-sshd initramfs is merely to unlock LUKS. It's OK for that to be a distinct from the rootfs network configuration method.

psgreco wrote:

Well, I think it's part of any package to adapt to make the user experience better, and what you call random hacks, for me are workarounds that make the package work ootb. So yeah, I think it is the right place

I disagree that it is a hack. It is actually just deleting a "disable NetworkManager" config file. This then allows NetworkManager to do what it is designed to do: auto-configure if there is no config or follow config files. It was a pleasant surprise that it also cleans up after itself afterward.

The bug you cite is about stopping the networkd service.

OK you know it better than me. If that's the case it doesn't meet the goal of cleaning up after itself. You might not agree with that goal but in the case of @psgreco he had to add post-boot hacks for years to cope with this problem. This lack of deactivation is also the cause of the double IP problem.

If you need a stable and deterministic IPv6 address (e.g. because you are running server services), the standard thing to do is to configure eui-64

Many of these users don't have control over their LAN. It was unexpected to get a different IP address than of rootfs. It was doubly unexpected that both the IPv4 and IPv6 initramfs addresses persisted to rootfs runtime. The userspace in the rootfs doesn't know how to deal with that. The IPv6 randomly disappears later and that IPv4 address would disappear on a possible service restart. It's confusing and unexpected behavior that currently happens.

it's pretty normal to use multiple stable and unstable addresses all the time,

It might be normal to get unstable IPv6 addresses. But that isn't helpful when the IPv6 address is the one that you might be trying to ssh into to unlock LUKS. Unlke a IPv4 subnet where the address space is small enough to ping sweep it's harder to find a randomly assigned IPv6 unstable address in the same manner.

FWIW, I just configured a Fedora 38 system with networkd for early boot and NetworkManager and I didn't run into any problems.
So it seems that your proposed change only helps for some specific NetworkManager setups.

It's dependent on LAN and gateway factors that may not exist on your networks. It gives me double IP's every time here.

However, your addon would only be of interest to a minority of NetworkManager users,
as dracut-sshd already works fine with NetworkManager, in general, today.

We are in disagreement because the issues are real. If someone uses dracut-sshd in the documented way problems already happen with rootfs NetworkManager afterward. For example see #63 where two other people report on doubled IP addresses issue.

It's OK we can apply the config in a Fedora/EL specific package. It's OK to declare distro-specific changes as outside the scope of the upstream project.

This is the last I'm writing on this topic. Moving on.

@wtogami
Copy link
Author

wtogami commented Jun 17, 2023

networkd could be mitigated
You could add a hook to tear down the ethernet interface prior to switchroot. I am not sure where or how. I confirmed manually in the rd.break shell that networkctl down DEVICENAME deactivated the interface in such a way that rootfs NetworkManager later had no trouble.

Would you accept a patch that does this for networkd?

@gsauthof
Copy link
Owner

wtogami wrote:

It was frustrating for me to see dismissed tickets [..]

Perhaps you should manage your expectations.
Not everybody follows all of your arguments.
Some of your arguments may be flawed.
If you open a pull request in an open source project you should be prepared for that it's getting rejected for (very good) reasons and you should respect that.

Arguing further about personal slights is not going to help our interpersonal relationship so I'm sticking to only technical responses below.

How generous of you pledging to stop posting personal slights to the dracut-sshd issue tracker.

Many of these users don't have control over their LAN. It was unexpected to get a different IP address than of rootfs. It was doubly unexpected that both the IPv4 and IPv6 initramfs addresses persisted to rootfs runtime. The userspace in the rootfs doesn't know how to deal with that. The IPv6 randomly disappears later and that IPv4 address would disappear on a possible service restart. It's confusing and unexpected behavior that currently happens.

It might currently happen for your very specific setup.
You don't have to have full control over your LAN to configure stable IPv6 addresses.
Even without control of your DHCP server, there are standard ways to configure stable addresses.
What you describe can all be fixed by proper client configuration, with networkd and NetworkManager.
Nothing specific to dracut-sshd, nothing that should require integrating such a hack with dracut-sshd.


You could add a hook to tear down the ethernet interface prior to switchroot. I am not sure where or how. I confirmed manually in the rd.break shell that networkctl down DEVICENAME deactivated the interface in such a way that rootfs NetworkManager later had no trouble.

Would you accept a patch that does this for networkd?

Probably not. For reasons similar to what I've stated previously.
If it's generally accepted to be a good idea than it should be done in the networkd dracut module.

However, perhaps KeepConfiguration=no already is sufficient for your use case.

Repository owner locked as resolved and limited conversation to collaborators Jun 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants