Thank you so much for this deep investigation!
I’ll repost parts of the main question in the Distributed channel, cause it seems to belong there.
Regarding the configuration changes:
- As far as I know
distributed.worker.memory.target
configuration is related to spilling process, which I’ve intentionally disabled (da.config.set({'distributed.worker.memory.spill': False})
). My goal for the entire experiment is to see how Dask manages the in-memory stream of data. - The other parameter:
da.config.set({'distributed.worker.memory.pause': 0.95})
is a nice catch, although I suspect it just postpones the stall - doesn’t remove the problem.
I was also wondering whether the problem could be of unfreed memory, that’s why I suspected some memory management global lock.
Thanks again, I’ll post an update here, if I find a resolution on the issue.
(I’ve posted the question to Distributed channel: Worker blocking on memory limit, despite the streaming-friendly pipeline process)