Cluster hangs when scheduler and a worker runs on the same machine

giorgostheo · June 4, 2022, 5:02pm

Hey team! I have a quick question regarding some weird behaviour when I create a distributed cluster over some machines in LAN. When I start a worker alonside a scheduler in one machine (the master in the instance), workers tend to timeout often and I can get nothing to compute. When I killed the worker (schedule is now running solo on master) everything works perfect… Is that normal? Is it mentioned somewhere and I missed it? If it is a bug I will look into posting on github (just wanted to make sure that it is not something too trivial). Thanks a ton!

pavithraes · June 7, 2022, 12:31pm

@giorgostheo Welcome to Discourse!

Is that normal? Is it mentioned somewhere and I missed it?

That does seem odd, and we’d need a little more information to say what’s going on. Would you be able to share the timeout error traceback, and describe how you’re setting up your distributed cluster?

giorgostheo · June 8, 2022, 1:20pm

I do not have extra info sadly since I took the cluster down due to the many errors I encountered… Dask was not at fault proly, I suspect it is the cloud provider problem (its an local academic one and you can not imagine how terribly it’s maintained).

All I know is that when I took the worker that existed alongside the scheduler down, the cluster stoped hanging… sorry I can’t more detail. Before, there where constant timeouts from random workers that lost connection when a computation started.

Reproducing should be easy tho if someone wants to take a look, just spin up a scheduler and worker on the same machine (I used the command too dask-scheduler and dask-worker) as well as some workers on other machines in LAN.

Again, I am 99% sure its not Dask’s fault, so maybe further investigation is unnecessary.

Thanks!

Topic		Replies	Views
General cause/scenarios for `worker-handle-scheduler-connection-broken` error Distributed dask-gateway , distributed	8	1205	November 3, 2023
Troubleshooting intermittent hanging behavior with one worker stuck running Distributed dask-array , distributed	3	28	June 13, 2025
LocalCluster deploying Deploying Dask distributed	1	230	January 15, 2023
Dask Controller (Dask Gateway) Sometimes Hanges Distributed dask-gateway	5	46	October 13, 2024
How to retry hanging jobs during a distributed computation Distributed dask-array , distributed	3	922	May 4, 2022

Cluster hangs when scheduler and a worker runs on the same machine

Related topics