You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was running a workflow using Afar on Coiled, and I noticed that the Afar version at a moment had a worker that stopped receiving tasks. Notice that in the task stream on the performance reports the Afar version, the last thread stops having tasks while in the non-afar version this doesn't happen. Is this the expected behavior? what is actually happening in here?
Note: the data is public so this should work as a reproducible example.
This does look suspicious--thanks for the report. I don't yet have a plausible explanation for why some threads would stall. The simplest "I have no idea what's going on, but let's take a stab at a solution anyway" thing to try would be to change the Lock to RLock in afar/_printing.py.
I have ruled out some of the weird bits of afar, such as using locks around updating builtins.print and using a custom channel to send messages from the worker to the client. It's really not clear to me what else in afar would even be relevant in causing any issues.
run_afar is a long running task that runs other tasks. This could be the source of the issue. Aha! To test this, I just reproduced the issue by running the following (w/o afar):
I was running a workflow using Afar on Coiled, and I noticed that the Afar version at a moment had a worker that stopped receiving tasks. Notice that in the task stream on the performance reports the Afar version, the last thread stops having tasks while in the non-afar version this doesn't happen. Is this the expected behavior? what is actually happening in here?
Note: the data is public so this should work as a reproducible example.
Workflow without afar:
Link to performance report
Workflow with afar
Link to performance report
The text was updated successfully, but these errors were encountered: