-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feedback on v3/docs/THREADING.md #94
Comments
Thank you for that - very informative. I will push an update soon. |
@jimmo I have pushed a provisional update but I would like clarification on the following:
Item 3 strikes me as quite significant and one I should emphasise in the doc. |
Thanks Pete, the update looks good. (Just one comment below)
Yes you're probably right. I think the change you've made solves the issue I was trying to highlight.
Yes, with one exception -- hard ISRs. In your updated version you write This is unfortunately not the case -- the hard ISR inherits the GIL state from the code that it interrupted. So this means that, for example, the main thread (or just the main code, if threads aren't enabled) could trigger a re-hash of a dictionary, and then if the hard ISR tried to read from that dictionary, it would see invalid state. A particularly scary variant of this is if the hard ISR races a re-hash of the globals dict. For example: def foo():
global x
x = 1 # assuming that adding x to globals happens to cause a re-hash
def hard_isr():
a = x
def main():
...
... some time later, call foo for the first time
foo() So if in the situation that the hard isr is triggered just as foo is called, and on the off chance that creating the 'x' global triggers a rehash, then that would be bad. Very unlikely, but you can imagine that there would be variants of this that would make this more likely. In general, any sort of mutation of a global data structure shared by hard ISRs must be protected with machine.disable_irqs. A simpler example is having a hard ISR consume a list that the main program is appending to. (Or the reverse, but that would generally be impossible because the hard ISR cannot allocate). It's important to re-iterate though that this only applies to mutation of the data structure. Individual elements are fine (which is why it's ok to have a global flag). Also worth noting that a ring buffer is not necessarily hard-ISR-safe because for example inserting the item could race with incrementing the indices, so again, disable_irq might be necessary.
Much of what I just wrote for hard ISRs above also applies here, except _thread.allocate_lock rather than machine.disable_irqs. Yep, it's pretty hard to write "regular Python" on the rp2 and not accidentally run into some pretty scary concurrency edge cases. (Honestly, I'm not convinced that having GIL-free concurrency was necessarily a good idea, maybe instead we should have had a way to offload self-contained work onto the second core, e.g. DSP routines or similar. An independent VM would probably be infeasible for the overheads, and MicroPython currently isn't designed for that) |
Thanks for that, I've updated the doc. The issue with globals is something new to me!
Are you sure? I've seen a lot of code that relies on the notion that a ringbuf with a single producer and a single consumer is thread safe. I'm pretty sure there's some in MP source... The producer controls the write pointer and the consumer controls the read pointer. The producer updates the buffer entry and then increments the write pointer. The consumer reads an entry and then increments the read pointer. The only conflict is when the producer checks for full state or the consumer checks for empty state, and these fail safe because of the post-increment. (Assuming that a pointer increment is atomic). Please could you explain how a race can occur. |
@jimmo I feel like I've been told that De Morgan's Theorem is wrong. The thread-safety of a ringbuf* is axiomatic; if not I'm responsible for a heap of broken code. Please put me out of my misery and explain the above! :)
|
Thanks for writing this guide @peterhinch !
1. Introduction:
I think it's worth adding micropython.schedule as another context, or possibly more accurately splitting (1) into:
I've seen a few examples of asyncio code that was running into issues due to use of micropython.schedule (and one example where essentially the entire program ran in scheduler context). But the main reason is that Hard and Soft/Schedule have very different implications for syncronisation.
It's of course an extra challenge that there are two "schedulers" at play here, the asyncio scheduler, and micropython.schedule.
1.1 Interrupt Service Routines
"The ISR and the main program share a common Python virtual machine (VM)." -- It might be important to clarify what this means. I'll come back to this in (1.3).
"Consequently a line of code being executed when the interrupt occurs will run to completion before the ISR runs." -- The VM has basically no concept of "line of code", rather it only deals with bytecode instructions. Technically a soft ISR (or scheduled task) can only run at certain bytecode boundaries, but these do not necessarily align with lines of code.
In the case of a hard ISR, the ISR can occur anywhere, even in the middle of a single bytecode instruction. For example, if you wrote
a + b
where a and b are lists, then an ISR in the middle of the op could modify botha
andb
, but the result could beold_a + new_b
."ISR code should not call blocking routines and should not wait on locks." -- In general, an ISR must not wait on a lock because it's impossible for something else to unlock it. The exceptions would be to wait on a lock with zero timeout (i.e. optimistically try to acquire the lock and abort otherwise), or maybe in a soft/scheduler context when the lock is set by a hard context.
1.2 Threaded code on one core
It might be worth clarifying that the GIL protects built-in data structures that are atomic at the bytecode level. However, any user-defined data structure gets no benefit from the GIL. (And ultimately, this is why it's not safe to call any asyncio methods like create_task from another context).
1.3 Threaded code on multiple cores
"There is no common VM hence no common GIL. " -- I guess it depends what you mean by "common VM" here, but I'd argue that there is a common VM, it just doesn't have a GIL.
The issue here is that in all circumstances, the VM is common (i.e. they share global state), but there can be multiple concurrent stacks. Hard interrupts are like the VM called your Python function while in the middle of running a bytecode instruction, soft interrupts are the VM calling your Python function in between instructions (note in both cases they share the same stack).
"In the code sample there is a risk of the uasyncio task reading the dict at the same moment as it is being written". This particular example may actually be completely fine, depending on the scenario.
If what you need is for the three read_data calls and the update of the dict to be atomic, then yes, you need a lock. i.e. if a reader must always see d["z"] that matches the same read cycle as d["x"].
But if you're happy to treat the three slots in the dictionary (and their corresponding values) as independent (maybe you're reading three temperature sensors, and the code just needs to know the latest), then there's no issue with this code. No locking required. This is because concurrently updating a dictionary entry is safe.
Where this falls apart is if you have concurrent threads that modify the dictionary in a way that isn't safe. e.g. one thread adding/removing items. (Simple example: one thread triggers a resize&rehash of the dictionary).
The same argument applies here to modification of global variables (which are just dictionaries with nice syntax). Two concurrent threads can modify global variables safely, as long as no thread does
del x
or creates a new global.2.1 A pool
"It is possible to share an int or bool because at machine code level writing an int is "atomic": it cannot be interrupted. In the multi core case anything more complex must be protected to ensure that concurrent access cannot take place. The consequences even of reading an object while it is being written can be unpredictable. "
I think what you're saying here is exactly correct, but I think the emphasis on the object type might be confusing for people that don't understand the details.
So for example, I can have a global variable (or dict entry) that is a complex object, and a concurrent writer that is updating that entry. It doesn't matter that it's a complex objectl, the "update" operation is just updating a pointer, which is atomic.
What matters is that it's not safe for me to write code that reads multiple fields from the object, i.e.
d["x"].a
followed byd["x"].b
if someone concurrently modifiesd["x"]
then I'll read a and b from different things.What the locking in this example ensures is that the consumer always sees x,y,z together from the same "cycle". (And I believe this is exactly the point you're trying to make, this is a very difficult guide to write because there are so many independent concepts going on here).
"As stated above, if the producer is an ISR no lock is needed or advised." -- However, in the case, a consumer must disable IRQs in order to achieve the consistency that the lock provided, otherwise the producer could modify the state mid-read.
Sorry, have to stop there for now, but will come back to it soon.
The text was updated successfully, but these errors were encountered: