Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: long-running WASM + Erlang scheduler #15

Open
munjalpatel opened this issue Jun 25, 2024 · 10 comments
Open

Question: long-running WASM + Erlang scheduler #15

munjalpatel opened this issue Jun 25, 2024 · 10 comments
Assignees

Comments

@munjalpatel
Copy link

munjalpatel commented Jun 25, 2024

Hey @bhelx amazing work!

Erlang scheduler generally doesn't like long-running processes.
For native Erlang / Elixir processes, it uses reductions to preempt so that other processes have fair chance of consuming CPU slice.

How would the scheduler react to WASM running through Rustler?
What advise do you have when someone needs to run potentially long-running WASM modules?

@bhelx
Copy link
Contributor

bhelx commented Jun 25, 2024

Thanks @munjalpatel! Could you link to any specific docs about what you are referring to? I haven't had this problem but maybe I'm misinterpreting your question.

@munjalpatel
Copy link
Author

munjalpatel commented Jun 25, 2024

@bhelx here are some references:

Why no long running nifs:
https://youtu.be/nw2eIB6bTxY?t=350
https://youtu.be/tBAM_N9qPno?t=2074

Here's the demo and potential solutions with Rustler
https://youtu.be/BREqrlzfQUo?t=1078

Here's the general info on how the Erlang Scheduler works
https://youtu.be/JvBT4XBdoUE?t=411
https://www.youtube.com/watch?v=tBAM_N9qPno

Quote from: https://blog.appsignal.com/2024/04/23/deep-diving-into-the-erlang-scheduler.html

To promote fairness among processes, Erlang's preemptive scheduling relies on reductions rather than time slices. If a process exhausts its allocated reductions, it can be preempted, even if its execution isn't complete. This approach prevents a single process from monopolizing the CPU for an extended period, fostering fairness among concurrent processes. By using reductions as the foundation for preemption, Erlang mitigates the risk of processes starving for CPU time. This design ensures that every process, irrespective of its workload, is periodically allowed to execute.

Essentially, when we run WASM as NIF, reduction count won't get updated. Hence, the scheduler will continue to give significantly more time to the process running the NIF. We have to somehow indicate to the scheduler the progress that's made inside NIF and represent it in terms of reductions as explained in the Rustler video ( https://youtu.be/BREqrlzfQUo?t=1078 ) @scrogson am I thinking about this correctly?

@tessi is handling it by running NIF in an OS thread ( tessi/wasmex#6 and tessi/wasmex#7 ). That obviously is a lot more heavy-weight than a NIF executing in the Erlang process itself -- but still better than having a blocking process.

I believe we have access to WASM's linear memory. I am not sure if there is a way to track instruction execution of a WASM module and run the module with an arbitrary instruction as a starting point. If there is, it might be possible to pause/resume a WASM module every 2ms while yielding 2000 reductions. This might be a very far-fetched idea though!!
@tessi has much better idea here tessi/wasmex#394

@scrogson
Copy link

@munjalpatel correct. Given that this library can't predict how long each WASM call will take to execute, it should at the very least make it into a DirtyCpu NIF so that it runs on the dirty schedulers.

@bhelx
Copy link
Contributor

bhelx commented Jun 26, 2024

Thanks for all this info @munjalpatel and @scrogson ! I hadn't considered that, but it makes sense. Our bindings using rustler are fairly naive.

@munjalpatel were you interested in contributing this? If not i can start by looking into this "DirtyCpu NIF" that @scrogson mentioned.

Another option could be just re-writing the underlying code to use wasmex and getting rid of the custom nif. I have explored this but not started or done a proof of concept: #3

@munjalpatel
Copy link
Author

munjalpatel commented Jun 26, 2024

@bhelx happy to help -- I am quite familiar with Elixir but have never worked in Rust before. But can certainly figure things out with guidance :)

Utilizing DirtyCpu should be fairly straight forward. If I recall, its just an annotation on the exported function.

However, I would much rather use wasmex instead of going DirtyCpu route for the following reasons:

So by using wasmex, we will inherit all these + future improvements :)

@bhelx
Copy link
Contributor

bhelx commented Jun 26, 2024

@munjalpatel that would be awesome! No pressure of course :) @tessi has done a great job on wasmex and is way ahead of my bindings. I only haven't switched over due to lack of time. If you come find me in our Discord in the #elixir-sdk channel. I can give you some real time advice or even do some pair programming if you feel like it would help.

@munjalpatel
Copy link
Author

@munjalpatel that would be awesome! No pressure of course :) @tessi has done a great job on wasmex and is way ahead of my bindings. I only haven't switched over due to lack of time. If you come find me in our Discord in the #elixir-sdk channel. I can give you some real time advice or even do some pair programming if you feel like it would help.

A bit busy this week. But let's get some time on our calendars for the next week and we can figure out the plan ahead.
What timezone are you in?

@bhelx
Copy link
Contributor

bhelx commented Jun 26, 2024

No problem, I can always find a way to squeeze in some time. I'm in the US in Central Time. In terms of UTC, I'm generally active from 11:00 UTC to 01:00 UTC. Just ping me on Discord, same username as Github.

@tessi
Copy link

tessi commented Jun 27, 2024

hey 👋 you mentioned me often enough to appear for a short comment :D

wasmex is pretty stable right now, but feature development is slow. Reason is mostly me focussing on family in my freetime (kids eat ones freetime for breakfast). Regarding wasmex, most work is in updating dependencies. We use wasmtime as the underlying wasm ecxecutor, which still sees some significant development and API changes. Usually good improvements, but it's still work to keep up. :)
That being said, I'd love to see people migrate to wasmex and am happy to onboard additional contributors to it. together we have more time and dedication than us alone.

I only haven't switched over due to lack of time

I feel you! :D if we can find a way to share some work or save some time in maintaining this, I'm all in. but no pressure, if we stay separate it's also cool!

@munjalpatel thanks for your efforts! 💛

@bhelx
Copy link
Contributor

bhelx commented Jun 27, 2024

@tessi thanks for the update on the project. No pressure for you to help of course. I think we should be able to mostly do it on our own. I think either way, we're faced with the decision to build all the stuff you have built ourselves, or join forces. The latter is the obvious choice.

Usually good improvements, but it's still work to keep up. :)

Yeah we are downstream of wasmtime for our standard fallback libextism dependency, so we know the project well!

That being said, I'd love to see people migrate to wasmex and am happy to onboard additional contributors to it. together we have more time and dedication than us alone.

Agreed, there's no reason you should need to shoulder all the burden. Happy to contribute to wasmex where we can if we can pull this off. Also perhaps we can attract some more contributors too to help you out. We have a few users of this library who might be able and willing to help out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants