Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dhcp fails to obtain an ip address #70

Open
jack206 opened this issue Nov 16, 2023 · 12 comments
Open

dhcp fails to obtain an ip address #70

jack206 opened this issue Nov 16, 2023 · 12 comments

Comments

@jack206
Copy link

jack206 commented Nov 16, 2023

When I change ip from static to dhcp,sometimes it failed to get ip.
There have two dhcp server in the network.
Packet capture is the same when the ip address is obtained and when the IP address is not obtained
Here is the Packet:
cb40b1414cb6cc0b372669749be3786

@jack206
Copy link
Author

jack206 commented Nov 16, 2023

use udhcpc-v eth0 can get ip。

@mdmillerii
Copy link

Phosphor networkd configures systemd networkd which is the actual DHCP client. Currently yocto is at v254.4 https://github.com/systemd/systemd-stable/blob/2e7504449a51fb38db9cd2da391c6434f82def51/src/network/networkd-dhcp4.c

You can see there are several debug traces that can be configured by adding a conf file for the unit in the drop in directory.

@mdmillerii
Copy link

You can find the current version of systemd from the bitbake recipe and it's include (.inc) in https://github.com/openbmc/openbmc/tree/master/poky/meta/recipes-core/systemd

@jack206
Copy link
Author

jack206 commented Nov 17, 2023

Phosphor networkd configures systemd networkd which is the actual DHCP client. Currently yocto is at v254.4 https://github.com/systemd/systemd-stable/blob/2e7504449a51fb38db9cd2da391c6434f82def51/src/network/networkd-dhcp4.c

You can see there are several debug traces that can be configured by adding a conf file for the unit in the drop in directory.

Which commit is involved in this part? I'd appreciate it if you could elaborate

@mdmillerii
Copy link

I'm not sure what commit you are referring to. I was referring to the concept that most unit settings can be overridden by a conf file. Search for systemd conf drop in will find several tutorials on the concept in addition to the general systemd help https://www.freedesktop.org/software/systemd/man/latest/systemd.syntax.html#

I expect if you turn on systemd networkd debug logging you can gain insight and further debug information.

@jack206
Copy link
Author

jack206 commented Nov 20, 2023

I'm not sure what commit you are referring to. I was referring to the concept that most unit settings can be overridden by a conf file. Search for systemd conf drop in will find several tutorials on the concept in addition to the general systemd help https://www.freedesktop.org/software/systemd/man/latest/systemd.syntax.html#

I expect if you turn on systemd networkd debug logging you can gain insight and further debug information.

I donot find any config to resolve it.

@jack206
Copy link
Author

jack206 commented Nov 20, 2023

I see https://github.com/systemd/systemd-stable/blob/1b5bbbf7b48b36f5c6b6fec75093eb0f3f87aac2/src/libsystemd-network/sd-dhcp-client.c#L1756,it will restart the dhcp client when receive an NACK, Does anyone know why?Can we add a retry?

@mdmillerii
Copy link

RFC2131 is quite explicit for both client and server behavior including when a server MUST remain silent and not reply with a DHCPNAK.

If the client receives a DHCPNAK message, the client restarts the
configuration process.

If one server is sending a DHCPNAK to the other servers DHCPOFFER on the same segment their configuration is broken. It is a race who wins (but DHCPACK inherently involves a commit to stable storage while a DHCPNAK does not, so it is NOT a fair race).

Your original packet capture does not display the Server Identifier defined in rfc2132 and whose usage specified in rfc2131. However, a casual reading implies it will be processed by the systemd client. This may or may not affect the response of the second server.

That said, the max delay on NAK in systemd-networkd seems comparable to an outage, while the delay in udhcpc seems to be much more aggressive (not scalable). I'd guess both will retry until they select an offer from the server that responds with an DHCPACK before the DHCPNAK from the competing server. (According to the RFC, the client retries start at 8s with exponential backoff to 64s and a smooth +-1s dither, although this is not directly related to DHCPNAK, as noted in the 2016 commit).

The OpenBMC policy is to work issues with the upstream code, so if you come up with a proposal you will need to discuss with systemd.

I suggest fixing your servers.

@jack206
Copy link
Author

jack206 commented Nov 28, 2023

RFC2131 is quite explicit for both client and server behavior including when a server MUST remain silent and not reply with a DHCPNAK.

If the client receives a DHCPNAK message, the client restarts the
configuration process.

If one server is sending a DHCPNAK to the other servers DHCPOFFER on the same segment their configuration is broken. It is a race who wins (but DHCPACK inherently involves a commit to stable storage while a DHCPNAK does not, so it is NOT a fair race).

Your original packet capture does not display the Server Identifier defined in rfc2132 and whose usage specified in rfc2131. However, a casual reading implies it will be processed by the systemd client. This may or may not affect the response of the second server.

That said, the max delay on NAK in systemd-networkd seems comparable to an outage, while the delay in udhcpc seems to be much more aggressive (not scalable). I'd guess both will retry until they select an offer from the server that responds with an DHCPACK before the DHCPNAK from the competing server. (According to the RFC, the client retries start at 8s with exponential backoff to 64s and a smooth +-1s dither, although this is not directly related to DHCPNAK, as noted in the 2016 commit).

The OpenBMC policy is to work issues with the upstream code, so if you come up with a proposal you will need to discuss with systemd.

I suggest fixing your servers.

Yeah,I read the udhcpc source code, which handles multiple dhcp servers, so you can get the ip quickly.(https://git.busybox.net/busybox/tree/networking/udhcp/dhcpc.c?h=1_28_stable),however, after receiving NACK, c systemed-network directly restarts the client. This seems normal

@mdmillerii
Copy link

Oh I missed that I happened on old 0.60 source and didn't see https://git.busybox.net/busybox/commit/networking/udhcp/dhcpc.c?h=1_28_stable&id=e2318bbad786d6f9ebff704490246bfe52e588c0 which is a refactor from https://git.busybox.net/busybox/commit/networking/udhcp/dhcpc.c?id=e2318bbad786d6f9ebff704490246bfe52e588c0 so open an issue with systemd for a similar feature...

@jack206
Copy link
Author

jack206 commented Nov 30, 2023

Oh I missed that I happened on old 0.60 source and didn't see https://git.busybox.net/busybox/commit/networking/udhcp/dhcpc.c?h=1_28_stable&id=e2318bbad786d6f9ebff704490246bfe52e588c0 which is a refactor from https://git.busybox.net/busybox/commit/networking/udhcp/dhcpc.c?id=e2318bbad786d6f9ebff704490246bfe52e588c0 so open an issue with systemd for a similar feature...

Can systemed-network also be imported according to this revision? I lost track while trying to reference the patch. Do you have any ideas?

@mdmillerii
Copy link

BusyBox and systemd are unrelated code bases. You can request consideration of feature parity but it's not discussed in the specification and arguably the server is the errant party explicitly in the specification. I already referenced the issue tracker but a proposed patch would likely get faster attention.

eddiejames pushed a commit to eddiejames/phosphor-networkd that referenced this issue Mar 1, 2024
It is in reference to openbmc/openbmc#2342

Currently, when a user tries to delete a non-static ip address, it throws an error as "the operation failed internally"

With the update, when an attempt is made to delete the link-local address of the bmc it throws "The operation is not allowed" error

Tested by:

Current behaviour:

busctl call xyz.openbmc_project.Network /xyz/openbmc_project/network/eth1/_66e80_3a_3aa94_3aefff_3afe81_3ad629_2f64 xyz.openbmc_project.Object.Delete Delete

response: Call failed: The operation failed internally.

Modified behaviour:

busctl call xyz.openbmc_project.Network /xyz/openbmc_project/network/eth1/_66e80_3a_3aa94_3aefff_3afe81_3ad62d_2f64 xyz.openbmc_project.Object.Delete Delete

response: Call failed: The operation is not allowed

Change-Id: Ib5df33b8ad356a868aecc508ad6531ed15390a1d

Signed-off-by: Jishnu CM <[email protected]>
Co-authored-by: Jishnu CM <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants