Dask scheduler lost connection to high workload worker

I have a delayed function that uses all the available threads on a worker. The workload is heavy and it won’t finish for hours. I notice while that is running. The scheduler keeps getting error like the one below. And any print() won’t make it way to the worker log until that whole function finishes.

I’m wondering if there’s a way to have a dedicated thread on each worker that takes basic heart beat and other works(like those added from client.submit()) ?

2022-03-19 17:48:06,849+0000 ERROR [MainThread] distributed.core: Exception while handling op broadcast
Traceback (most recent call last):
  File "/dependencies/lib/python3.8/site-packages/distributed/comm/core.py", line 284, in connect
    comm = await asyncio.wait_for(
  File "/usr/local/lib/python3.8/asyncio/tasks.py", line 501, in wait_for
    raise exceptions.TimeoutError()
asyncio.exceptions.TimeoutError
The above exception was the direct cause of the following exception:


Traceback (most recent call last):
  File "/dependencies/lib/python3.8/site-packages/distributed/core.py", line 521, in handle_comm
    result = await result
  File "/dependencies/lib/python3.8/site-packages/distributed/scheduler.py", line 6020, in broadcast
    results = await All(
  File "/dependencies/lib/python3.8/site-packages/distributed/utils.py", line 208, in All
    result = await tasks.next()
  File "/dependencies/lib/python3.8/site-packages/distributed/scheduler.py", line 6012, in send_message
    comm = await self.rpc.connect(addr)
  File "/dependencies/lib/python3.8/site-packages/distributed/core.py", line 1071, in connect
    raise exc
  File "/dependencies/lib/python3.8/site-packages/distributed/core.py", line 1055, in connect
    comm = await fut
  File "/dependencies/lib/python3.8/site-packages/distributed/comm/core.py", line 308, in connect
    raise OSError(
OSError: Timed out trying to connect to tcp://172.27.0.19:37745 after 30 s

@ubw218 Thanks for your question! Could you please share a minimal, reproducible example? It’ll allow us to help you better.