For example, I want to generate a random dask array like
da.random.random((10000, 10000, 10000), chunks=(100, 100, 100))
is it possible to distribute the load efficiently ? In my case I could see one of the worker node is being used heavily.
Please suggest standards to be followed
Let me know if more details required.
Dask should distribute load automatically. It’s possible that work stealing is interfering; it’s been known to make poor scheduling choices like this: Root-ish tasks all schedule onto one worker · Issue #6573 · dask/distributed · GitHub.
You could try disabling work stealing via the