How to efficiently distribute load with worker nodes?

For example, I want to generate a random dask array like

def daskCustom():
   da.random.random((10000, 10000, 10000), chunks=(100, 100, 100))

client.submit(daskCustom)

is it possible to distribute the load efficiently ? In my case I could see one of the worker node is being used heavily.

Please suggest standards to be followed

Let me know if more details required.

Dask should distribute load automatically. It’s possible that work stealing is interfering; it’s been known to make poor scheduling choices like this: Root-ish tasks all schedule onto one worker · Issue #6573 · dask/distributed · GitHub.

You could try disabling work stealing via the distributed.scheduler.work-stealing config.

1 Like