-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement multi-threading using OhMyThreads and make it differentiable #70
Conversation
@lkdvos I could use some help on this one... I still find writing rrules very confusing sometimes, so could you maybe take a look at the rrule for Other than that we have to think about how to pass along the threading kwargs to the I'll also add some rrule tests in the end. |
I think tforeach is going to be a bit hard, because it has no outputs, and is thus necessarily in-place... I have no clue how the zygote buffer magic works, so I can't really say I know how to deal with that either. I should have some more time to think this through next week though! For the global variables, I would maybe suggest ScopedVariables.jl instead, this is a little more flexible and shouldn't incur too much runtime costs |
How about we just stick to
Wasn't aware of ScopedValues.jl yet, that looks like a great solution. But I don't quite understand the necessity of a scoped value here since we never need to access the threading settings inside a multi-threaded map, right? In any case, we can probably just have a global scoped Anyways, I will give these things a go and then we can review next week, when you have time :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I should have elaborated a bit more on what I had in mind with the scoped values: I would keep the threading strategies in scoped values, such that they can always be accessed as if they are global values. (this prevents them from bloating all of our algorithms)
The benefit of having them as a scoped value however means that users could still change them by calling the peps function from within a scope with a modified scoped value, thus changing the scheduler.
…Defaults docstring
I was trying to fix the Zygote error but with no luck, I really don't know how to handle the
|
I'm honestly not so sure how no one has ever run into this, but I seem to have been able to circumvent the issue by just not differentiating through the |
cbc1fe7
to
f3d7192
Compare
f3d7192
to
cab6c69
Compare
Codecov ReportAttention: Patch coverage is
|
I noticed that the default Other than that this seems mergeable, thanks a lot for pinpointing and circumventing the error! It is really weird how this Zygote problem never came up before... |
Whoops somehow the tests are starting to fail which was not the case before the merge, need to investigate what's happening... So somehow there are some segmentation faults appearing in the Julia LTS tests probably related to the multi-threaded reverse rule. This doesn't seem to be a problem on the latest Julia channel. What is weird is that even when I run the tests locally on Julia 1.10 they won't fail on my machine. I find this quite confusing. @lkdvos What do you think is the best way to proceed here? Should we restrict the number of threads for the LTS tests to one? |
I'm definitely not a big fan of fixing problems by just turning off the tests, so let's maybe try an see if we can reproduce this anyways? Given that it happens on all platforms, presumably it is a Julia specific problem, but I would still love to either know what causes it, so we can avoid it, or just fix it in the first place... I guess the reason it only started turning up now is that we were not testing on julia 1.10 before, and instead had 1.9 an 1.11 |
I definitely agree that turning off tests is not the way to go. However, I haven't managed to reproduce it in any way on Ubuntu with Julia 1.10.6 (and also other Julia versions) - the tests just run through without erroring or causing a segmentation fault, regardless of the number of threads.
Weirdly, the examples run through on the LTS on macOS (whereas they do not on Ubuntu and Windows). So there might also be some interaction with the OS and multi-threading? Unfortunately, I don't have access to macOS or Windows machines so I don't really know how to approach this or narrow this down. Also, I don't have that much more time to spend on this (and probably neither do you), so that's why I was looking for a quick solution :-) Anyways, if you come up with ideas to try out, let me know and I can revisit this next week. |
Just a quick thought: Could it be that the OhMyThreads multi-threading is interfering with OpenBLAS multi-threading? If the CI runners have only 4 cores in total then choosing |
This is unlikely to be the case (but of course without an actual solution I can't exclude anything), because of Julia's task-based parallelism, and this basically being handled by the operating system. In principle there is nothing stopping you from running 20 tasks on a single threaded setup, the OS will switch between these tasks as usual, and everything should still work the same. Of course, there will only ever be one task running in that case, while you still pay the overhead of the tasks and context switching, so typically this hinders performance a bit, but it shouldn't break anything. I'll try to spend some more time to investigate this week, I definitely don't feel comfortable releasing something that breaks on the LTS on all platforms, so if we can't figure this out we might have to restrict to v1.11, which I'd rather avoid |
Oh well, you're right. I just tried to come up with some reason why the tests run through on all my Ubuntu machines on 1.10.6 and why they don't on the CI runners. Also the fact that the macOS examples do run through on 1.10.6 is very weird. This seems to happen deterministically (I reran the CI earlier) but it is somehow platform-specific. Are the tests running through for you locally on an LTS release? I'm currently stumped on this one. |
If this runs through, I'll re-enable the tests for all OS, and then it is definitely my fault. I think I might have screwed up something with the caching of precompiled data, which might have not included the julia version in the hash, thus possibly leading to these weird kinds of errors. (Just guessing, still need to check) |
Alright, I see no more segmentation faults, thanks for taking a look and fixing it! This looks mergeable now :-) |
Let's just wait a second until the tests actually all turn green (seems like they are not running now for some reason that I don't understand) |
9cebff2
to
a9c0b82
Compare
a9c0b82
to
3ea4833
Compare
46edd69
to
4844ea0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If everything now turns green, good to go for me
Here we'll replace the
@fwdthreads
macro withtmap
andforeach
calls. Additionally, we will code up reverse rules such that the backwards pass also runs in parallel.