Search for deadlock cause | freeze upon data read; distributed; other?

I used to have the default worker-ttl: "600s" in the distributed config. When these worker failures occurred, the dashboard was showing a freeze at the end of completing a batch of tasks, such as in the screenshot below, with 86/91 completed. Everything froze for 600s, the worker was restarted (by checker_worker_ttl), then things continued.

This symptom of freeze always occurred at 95% percent completion with only a few tasks left.

I see that this symptom of deadlock at the very end of a task batch has been reported in at least 2x more instances: here, and here.

I now have worker-ttl: "60s". When the worker failures occur I believe that there is no more general freeze due to the single worker failure (the true deadlocks still occur as reported in the original post).

  1. Could the deadlock be caused by dask/distributed/#8616? If yes, then I could maybe simply upgrade to py3.12 instead of my current py3.11.