-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tokio 1.39/mio 1.0 appears to be dropping waker notifications on illumos #169
Comments
In tokio-rs/mio@1133ed0 it looks like the waker impl on illumos (unintentionally?) got switched from pipe-based to eventfd. |
PR to switch the impl back to being pipe-based: tokio-rs/mio#1824 For the eventfd impls, there might be some bug-incompatibility between Linux and illumos here -- might be worth figuring out what it is. |
I looked at the eventfd impl, as well as the Linux and illumos man pages, and to be honest the code is really simple and looks like it should work. Suggests possibly a bug in the illumos eventfd impl. Too tired to continue but at least we have a good workaround :) |
Repro:
@rmustacc has expressed an interest in looking at the issue. |
Workaround is to set |
This appears to be a similar problem to https://www.illumos.org/issues/13436 in that the epoll wakeup appears to occur only on an eventfd transition from zero to non-zero. Resetting to zero before waking up results in the test passing. diff --git a/src/sys/unix/waker/eventfd.rs b/src/sys/unix/waker/eventfd.rs
index c0086fc..3a1e6bf 100644
--- a/src/sys/unix/waker/eventfd.rs
+++ b/src/sys/unix/waker/eventfd.rs
@@ -42,6 +42,11 @@ impl Waker {
#[allow(clippy::unused_io_amount)] // Don't care about partial writes.
pub(crate) fn wake(&self) -> io::Result<()> {
+ // The epoll emulation on some illumos systems currently requires the
+ // eventfd to transition from zero for an edge-triggered wakeup.
+ #[cfg(target_os = "illumos")]
+ self.reset()?;
+
let buf: [u8; 8] = 1u64.to_ne_bytes();
match (&self.fd).write(&buf) {
Ok(_) => Ok(()),
bloody:mio:HEAD% cargo test --all-features --test waker -- waker_multiple_wakeups_different_thread
Finished `test` profile [unoptimized + debuginfo] target(s) in 0.08s
Running tests/waker.rs (target/debug/deps/waker-1f848cdbed8b9273)
running 1 test
test waker_multiple_wakeups_different_thread ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 6 filtered out; finished in 0.05s And in fact all tests pass with this change
|
This behaviour seems to be intentional in the illumos implementation: static int
eventfd_write(dev_t dev, struct uio *uio, cred_t *credp)
{
...
/*
* Notify pollers as well if the eventfd is now readable.
*/
if (oval == 0) {
pollwakeup(&state->efd_pollhd, POLLRDNORM | POLLIN);
}
return (0);
} |
I have opened https://www.illumos.org/issues/16700 upstream for this, and also opened illumos/epoll-test-suite#2 to add a test to the illumos epoll test suite. |
Tokio 1.39 updated its mio dependency to 1.0, which changed the waker impl on illumos from a self-pipe to eventfd. That has caused several issues already: * oxidecomputer/helios#169 * oxidecomputer/helios#171 Based on these and the potential for other lurking issues, we're making a policy decision to roll back to 1.38 (mio 0.8) for r10. We can't be off of the train forever so we're aiming to land the 1.39 update early in the r11 cycle. This backs out commit d7d4bea.
Tokio 1.39/mio 1.0 switches out the illumos impl to being eventfd based. For release 10 we decided that that was too risky, so we switched back to Tokio 1.38. Now that the r10 branch has been cut, we can go back and update Tokio to 1.39.3. We'd like to land this early in the cycle to get as much soak time as possible. See: * #6356 * #6249 * oxidecomputer/helios#169 * oxidecomputer/helios#171 * #6391
The issue was resolved in upstream mio (in a way that was not quite what we wanted -- see mio 1824 linked above), and the epoll bug has been resolved in illumos. mio will need to carry their workaround for a while most likely, since many illumos systems in the wild are going to take a while to upgrade, and this code is quite sensitive. |
(filing this here for now just to put down notes)
Tokio 1.39.2's test suite consistently hangs on illumos while 1.38.0 doesn't. Via a git bisect we traced it down to this being an issue in mio 1.0, with (in the mio repo)
cargo nextest run --all-features --test waker waker_multiple_wakeups_different_thread
:Via another bisect we found that this pair of commits is to blame, and the pair looks relevant:
The second commit is a child of the first, and the first commit doesn't build on illumos, so consider the pair of commits as a unit.
Combined diff of the two commits in case it's helpful: https://gist.github.com/sunshowers/c573ae448d2c1eb028216c3f3d644719
cc @jclulow who first noticed the issue
The text was updated successfully, but these errors were encountered: