Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] virtio-net: Guest can't NACK LRO #3905

Closed
3 tasks done
cperciva opened this issue Jun 30, 2023 · 7 comments
Closed
3 tasks done

[Bug] virtio-net: Guest can't NACK LRO #3905

cperciva opened this issue Jun 30, 2023 · 7 comments
Assignees
Labels
Priority: Medium Indicates than an issue or pull request should be resolved ahead of issues or pull requests labelled

Comments

@cperciva
Copy link
Contributor

Describe the bug

When FreeBSD boots in Firecracker, it disables LRO features because it doesn't want to allocate 64 kB buffers. (Maybe this should change in FreeBSD, but that's how our virtio-net driver currently does things.)

Firecracker computes the acked_features value but seems to completely ignore it(?); in any case, Firecracker assumes that the guest can handle >1536 byte segments despite FreeBSD disabling the relevant options. This results in the device hanging since Firecracker attempts (and fails) repeatedly to write a packet which doesn't fit onto the guest I/O ring.

To Reproduce

  1. Boot FreeBSD in Firecracker.
  2. Attempt to git clone a repository from github.

I'm not sure why, but github always seems to trigger this while I've never seen it with other networking. Possibly because github is very close to my test instance in us-east-1?

Expected behaviour

Firecracker should pay attention to the features accepted by the guest, and not send large segments if the guest has said it doesn't want them.

Firecracker should probably also drop packets rather than entering an infinite loop trying to write them into a buffer which they don't fit into. Obviously this isn't a trivial change since buffer space might open up later; I don't know enough about these bits to suggest the right solution here.

Environment

Firecracker @ 1ca6f8c plus PVH support patches. Linux 5.15 host; FreeBSD 14.0 guest; amd64 architecture.

Additional context

Checks

  • Have you searched the Firecracker Issues database for similar problems?
  • Have you read the existing relevant Firecracker documentation?
  • Are you certain the bug being reported is a Firecracker issue?
@roypat
Copy link
Contributor

roypat commented Jul 25, 2023

Related: #2433

@xmarcalx
Copy link
Contributor

xmarcalx commented Aug 7, 2023

Hi,
Thanks for reporting this issue.
This issue affect at the moment only FreeBSD guestOS. Linux mainline currently is not impacted but this does not exclude the possibility that will be impacted in the future.
This request is valid, as next steps we need to design a solution applicable to Firecracker and working also with our supported OSes.

@xmarcalx xmarcalx added the Priority: Medium Indicates than an issue or pull request should be resolved ahead of issues or pull requests labelled label Aug 7, 2023
@acj
Copy link
Contributor

acj commented Jan 3, 2024

I've been experimenting with a FreeBSD guest for a GitHub Actions CI use case and ran into this as well. In case it helps anyone, here are two workarounds:

  1. Patch Firecracker so that MAX_BUFFER_SIZE is in the range of MTU values that can be used with FreeBSD's virtio-net driver (<= 16k, I think). In /etc/rc.conf, append mtu 16384 (or whatever max buffer size you used) to the ifconfig_vtnet0 line.

    With this option, I can get good throughput (10+ MB/s) on my home cable connection from sites like GitHub. Curiously, though, throughput from FreeBSD package mirrors is extremely low, often <200 KB/s. Downloading the same .pkg files from the host is much faster. Similar results in GitHub Actions runners.

    Throughput improves somewhat when the guest's interface is configured with a larger MTU, perhaps because FreeBSD's driver is scaling its receive buffer sizes based on the MTU value.

  2. After the guest has booted up, disable TCP segmentation offload (TSO) on the host with something like ethtool -K tap0 tso off. This needs to be done after each launch because it gets reenabled, even if I disable TSO4 in Firecracker's available features list.

    I haven't tested this option thoroughly yet, but throughput seems good when downloading from all hosts.


@bchalios I saw your post about this in slack:

We definitely assume that UFO & TSO4 will be accepted, so we assume that frames will always fit in the buffers will be provided by the guest. We can fix that

Would you accept a PR for this? I'm willing to work on a proper fix but wanted to ask before digging in.

@bchalios
Copy link
Contributor

bchalios commented Jan 5, 2024

Hey @acj, we had already started working on this. Should be able to post a PR in a week or so. I can ping you once it's out so you'll let me know what you think. Does that work for you?

@pkit
Copy link

pkit commented Apr 27, 2024

This issue affect at the moment only FreeBSD guestOS. Linux mainline currently is not impacted but this does not exclude the possibility that will be impacted in the future.

I'm not really sure that it's not impacted. eBPF support (which is currently all the rage on Linux) seems to fail miserably because of this bug (and some others).

@bchalios
Copy link
Contributor

Hi @pkit. Could you please provide a reproducer for the issue. We do have some work in progress to fix this, just got de-prioritized due to other work. One thing missing was an integration test for making sure our fix works in Linux systems.

bchalios pushed a commit to bchalios/firecracker that referenced this issue Jul 15, 2024
Currently, we assume that the guest virtio-net driver acknowledges all
the offload features we offer to it and then subsequently set the
corresponding TAP features.

This might be the case for the Linux guest kernel drivers we are
currently using, but it is not necessarily the case. FreeBSD for
example, might try to NACK some of them in some cases:
firecracker-microvm#3905.

This commit, changes the Firecracker implementation of VirtIO device to
only setup the TAP offload features that correspond to the ones that the
guest driver acknowledged.

Signed-off-by: Nikita Zakirov <[email protected]>
Signed-off-by: Babis Chalios <[email protected]>
bchalios pushed a commit to bchalios/firecracker that referenced this issue Jul 15, 2024
Currently, we assume that the guest virtio-net driver acknowledges all
the offload features we offer to it and then subsequently set the
corresponding TAP features.

This might be the case for the Linux guest kernel drivers we are
currently using, but it is not necessarily the case. FreeBSD for
example, might try to NACK some of them in some cases:
firecracker-microvm#3905.

This commit, changes the Firecracker implementation of VirtIO device to
only setup the TAP offload features that correspond to the ones that the
guest driver acknowledged.

Signed-off-by: Nikita Zakirov <[email protected]>
Signed-off-by: Babis Chalios <[email protected]>
bchalios pushed a commit to bchalios/firecracker that referenced this issue Jul 15, 2024
Currently, we assume that the guest virtio-net driver acknowledges all
the offload features we offer to it and then subsequently set the
corresponding TAP features.

This might be the case for the Linux guest kernel drivers we are
currently using, but it is not necessarily the case. FreeBSD for
example, might try to NACK some of them in some cases:
firecracker-microvm#3905.

This commit, changes the Firecracker implementation of VirtIO device to
only setup the TAP offload features that correspond to the ones that the
guest driver acknowledged.

Signed-off-by: Nikita Zakirov <[email protected]>
Signed-off-by: Babis Chalios <[email protected]>
bchalios pushed a commit that referenced this issue Jul 17, 2024
Currently, we assume that the guest virtio-net driver acknowledges all
the offload features we offer to it and then subsequently set the
corresponding TAP features.

This might be the case for the Linux guest kernel drivers we are
currently using, but it is not necessarily the case. FreeBSD for
example, might try to NACK some of them in some cases:
#3905.

This commit, changes the Firecracker implementation of VirtIO device to
only setup the TAP offload features that correspond to the ones that the
guest driver acknowledged.

Signed-off-by: Nikita Zakirov <[email protected]>
Signed-off-by: Babis Chalios <[email protected]>
@bchalios
Copy link
Contributor

#4680 changes Firecracker to properly negotiate features with the guest. I will close this now, but please let us know if this problem persists.

ghost pushed a commit to JamesC1305/firecracker that referenced this issue Aug 14, 2024
Currently, we assume that the guest virtio-net driver acknowledges all
the offload features we offer to it and then subsequently set the
corresponding TAP features.

This might be the case for the Linux guest kernel drivers we are
currently using, but it is not necessarily the case. FreeBSD for
example, might try to NACK some of them in some cases:
firecracker-microvm#3905.

This commit, changes the Firecracker implementation of VirtIO device to
only setup the TAP offload features that correspond to the ones that the
guest driver acknowledged.

Signed-off-by: Nikita Zakirov <[email protected]>
Signed-off-by: Babis Chalios <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Medium Indicates than an issue or pull request should be resolved ahead of issues or pull requests labelled
Projects
None yet
Development

No branches or pull requests

6 participants