-
Notifications
You must be signed in to change notification settings - Fork 6.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Socket Service: Restart Flag Not Reset and Accidentally Cleared by Offloaded Polling Events #81813
Comments
Hi @illysky! We appreciate you submitting your first issue for our open-source project. 🌟 Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. 🤖💙 |
Thanks for impressive bug analysis. it must have been difficult debugging session as the interaction of different components is complicated. About the simplelink driver, I don't fully understand the interactions here. Both the driver and the socket service have their own fds array so how the driver could affect the service here. Or is this related to recent change where the poll implementation was moved to zvfs in commit 881dc1f |
No problem. Happy to help! Debugging was eye watering! 😅 Yeah I think that socket service uses the global fdtable now, but it makes its own copy on thread restart, and passes that to zsock_poll. When simplelinks zsock_poll implementation is called, it assumes all the fds belong to it and starts clearing revents. Element 0 in ctx->events is used by socket service so that gets cleared. It will be also problematic when I start using sockets from other offloaded interfaces. I’ve tested it my proposal, it’s all working fine now, I put back in the read call as you said. Let me know if you need me to do anything. 🙏 |
Relocating the code to restart polling looks correct. Work should probably get done before restarting so that events are not lost. I agree with Jukka as well though, that we should still read from the eventfd to clear the value from it. @illysky - would you be able to make a pull request with your fix? I suspect that this issue would affect not only offloaded sockets but also centrally managed ones. With that, you could create a ZTest to demonstrate the problem in isolation with intentionally injected conditions. The test can be run with e.g. qemu, or native_sim. The test would be expected to fail when run without the fix. With the fix would demonstrate the test passing. That's important for maintainers reviewing new code submissions, but it also ensures we don't encounter the same problem in the future. |
I’m working with Zephyr v4.0.0 and trying to get to grips with the socket service. The process has been somewhat challenging, as I’m also using a combination of offloaded sockets alongside native sockets, with plans to eventually incorporate native DNS and TLS. There’s a good chance that some of the issues I’m experiencing are due to a misunderstanding on my part, so apologies in advance.
The DNS Resolve module registers its sockets using
net_socket_service_register
, as does one of my own modules for HTTP. Each time a new socket is registered, the socket service thread is instructed to restart to reload the updated events. This behavior is triggered by the following code insocket_service.c
:And handled as:
However, I can’t see where or how the restart flag (i.e.,
ctx.events[0].fd = 0
) is cleared once the restart is complete. It doesn’t appear thatzvfs_eventfd_read(ctx.events[0].fd, &value)
clears it, and frankly, I’m not sure what the purpose of that call is anyway.After digging into this further, it seems the restart flag might be cleared indirectly—or even randomly—by modules such as offloaded socket drivers. For example, I’ve implemented a variant of the in-tree SimpleLink Wi-Fi driver to support the NWP-only CC3120. The original driver doesn’t appear to be designed with the socket service in mind, as it modifies file descriptors that don’t belong to it. This creates problems, as the socket service reserves the first element of
ctx.events
for its internal control.Specifically, the SimpleLink driver clears all
revents
fields before checking ownership, which can inadvertently prevent the socket service from restarting. Additionally, when the driver detects it doesn’t own a file descriptor, it returns an error instead of gracefully ignoring it and leaving other interfaces unaffected. Here’s the relevant snippet fromsimplelink_sockets.c
:Since my implementation of the driver no longer interacts with file descriptors that don’t belong to it, the restart flag isn’t always cleared as expected. This results in the socket service repeatedly restarting when returning from a blocked state in
zsock_poll
, causing socket data processing to be skipped.Have I understood this issue correctly? If so, who is responsible for clearing the restart flag in
socket_service.c
(i.e.,ctx.events[0].fd = 0
)? I assume the service itself should handle this. Additionally, I’m wondering if the restart condition should be checked and addressed after processing the results ofzsock_poll
.To address this, my proposed update to
socket_service.c
is as follows:Apologies if I’ve misunderstood any part of this—I've only been working on it for a week. Thanks in advance for your insights!
The text was updated successfully, but these errors were encountered: