Scheduler not saturating workers

Hi @guillaumeeb,

Thanks for the quick response! I cant seem to get a minimum reproducer, but this looks like it roughly does the same thing: Root-ish tasks all schedule onto one worker · Issue #6573 · dask/distributed · GitHub but with long running tasks (roughly 3-45 minutes).

I tried

with dask.config.set({"distributed.scheduler.worker-saturation": "inf"}):

But the issue still persists.

Something that might be helpful is a little more information about how I am using the workers. I am starting N-workers on each machine each with one thread and 4GB of memory. The workers themselves never run into memory issues but I think that is because each of these simulations is called as a subprocess from within the worker, so it does not seem to be aware of the memory use of the .exe it is calling. Might this be partially to blame for why backlogged work isnt moving to idle workers?

Another thought I had was that maybe I could force queuing for all tasks greater than the number of workers*threads, i.e. 3 workers, 1 thread each on 5 machines → 15 workers – start tasks 1-15 and queue tasks 15-M. Does that seem possible?