-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
linux: use threads for console IO #6770
Conversation
3d74ee8
to
a92d859
Compare
Signed-off-by: Timofey Titovets <[email protected]>
Signed-off-by: Timofey Titovets <[email protected]>
a92d859
to
3c4f73d
Compare
Thanks for working on this. Does this change improve overall IO performance (eg, steps per second) and if so by how much? It seems odd to me that replacing Same comment as before on ringbuf.c - it needs to have a copyright statement if it is locally developed or it should be moved to lib/ if it is an external file. For what it is worth though, I'm not sure why a ringbuf is needed. If there is a background thread for the console, it can read the buffer, parse the messages, signal the main thread when commands are ready to run, collect any output from the main thread, and write that output back to the console. The PRU code already uses a separate thread for console reading (the code is in src/pru/pru0.c , but admittedly the PRU is so goofy it's hard to follow that code). Cheers, |
I don't think so. The original PR replaced the whole logic that works with GPIO to map/DMA, which will make it platform-specific but also provide huge performance benefits (no more syscalls for GPIO access). I tried to import the idea of separating IO threads there, like if the main thread will not do IO, it should be basically a bit faster. I tested the step rate, the same setup, and the same RPI5 with fixed 2 GHz (to be not too fast, there are a lot of people with slower hosts).
714k originally (210 ticks), 701K with patches (214 ticks).
It was just a simple handy solution, I have no strong opinion here, I don't think it really matters. I think if there is a goal, for improvement of the performance of Linux MCU, I think there are only 2 solutions:
Timers on Linux MCUs are not the same as on bare-metal MCUs, where they perform actual work. Below are my speculations. BTW, So, maybe there are fundamental kernel limits on frequency here. With nanosleep(1000)
gpio calls:
|
There is a subtle race with I'm not sure if the goal was to remove jitter from Actually, thinking about it further, one might be able to also get rid of the Whatever the source of the issue, it'll be hard to make changes here if we can't benchmark an improvement. It might be necessary to implement memory mapped gpio first so that the jitter from syscalls becomes more apparent in tests. Not sure. Cheers, |
FYI, I tried prototyping code using Oh well. |
I reimplemented PRU-like logic with threads, but it performed even worse to my surprise. I expected no real difference to just threaded IO. Looks like RTT time just goes bad, so console.py can't initialize in 2 seconds and throw timeout error. And it nothing to do with scheduling jitter right now, so I give up on that.
I tried to move timers to the separeted thread, and drop system timer and it works (with some locking), but does not provide any benefits (step rate is lower ~667K) and is buggy (ex. Looks like shutdown() from timers brake locks). My a little naive solution:
That was another interesting dive into MCU code, but it really looks like it hard to improve it further. Anyway, thanks for your time. |
Ah, if you're curious, I uploaded what I attempted yesterday (again, it's broken code): https://github.com/KevinOConnor/klipper-dev/tree/work-linux-20250102 -Kevin |
This is the porting of changes from: #6514
There is a threaded console IO, with a simplified ring buffer.
There are no functional changes, but this will introduce some latency for e2e messaging.
Tested on RPi5 with fixed frequency to 2GHz.
I tried to use signaling to wake up the main thread if it sleeps and if the reader receives something.
The same applies to the writer, who sleeps or wakes up on the signal.
But seems like any other syscalls in the main thread or additional signals introduce pauses which leads to a "Rescheduled timer in the past" error.
So, there are pauses.
In the perfect scenario, I think, the main thread can use clock_nanosleep to sleep until the next SIGALRM or till a signal from the reader arrives, everything to save CPU cycles.
Mostly mitigated.
Sleep times are arbitrary values here, mostly tradeoffs between CPU usage and latency.