[BUG]: CommClosedError
on heartbeat during Dask Client shutdown
#2026
Labels
bug
Something isn't working
CommClosedError
on heartbeat during Dask Client shutdown
#2026
Version
24.10
Which installation method(s) does this occur on?
Source
Describe the bug.
Original issue: #1990
Related Dask issue: dask/distributed#7891
When shutting down a Dask Client in Morpheus, an intermittent CommClosedError occurs, specifically during the heartbeat process. This error happens because a coroutine initiates a heartbeat communication after the scheduler has already been closed.
Expected Behavior:
The Dask Client should close gracefully without any errors indicating communication failures.
Observed Behavior:
An error is logged due to a
StreamClosedError
intornado
, whichdask
relies on for communication. This error appears intermittently during shutdown of stages that use dask.Root Cause:
This issue appears to be a race condition in
dask.distributed
where the coroutine initiating a heartbeat is not aware that the scheduler has already closed, causing it to attempt communication and fail.Proposed Solutions:
Minimum reproducible example
Run the
ransomware_detection
pipeline.Relevant log output
Click here to see error details
Full env printout
Click here to see environment details
Other/Misc.
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: