Number of nodes/workers limit

Hi, I am still using dask SGECluster to distribute my workload. It looks like dask can only scale to ~400 nodes out of 500 nodes asked (resource is available). Is there a scaling limit on the number of nodes for dask? Thanks.

Hi,

There is no theoretical or configuration limit on the number of Dask Workers. Dask has already been used with a few thousands nodes. However, this is not always easy to do so.

In your case, what makes you say Dask is limiting the scaling? Are every job starting workers running on your cluster?

What is the configuration of your SGECluster compared to the resources of the nodes you have?

@guillaumeeb Thanks for looking –

After checking printout files, I found 500 jobs started actually, but ended prematurely due to timeout. By setting death_timeout = 1200 seems to do the work.

1 Like