Concurrent futures, slurm, and adapt

I have a setup where a python script kicks of a slurm cluster of dasks tasks. It is using the concurrent futures interface. I am using the interface adapt(minimum=1, maximum=500). However, it never scales up past 1 dask node. When I use the scale interface it scales find. I do have long running tasks. Does dask need to wait until the first concurrent task returns before scaling up more? What would cause dask to not scale up the cluster/dask nodes.

Hi @M1Sports20, welcome to Dask community!

By default yes, it needs to know how long a single task will last before deciding to autoscale or not.

AFAIK, I think there are two solutions to this:

  • Specify roughly the duration of a given function your submitting in the config:
import dask
dask.config.set({'distributed.scheduler.default-task-durations.my_function': '1h'})
  • Specify a long duration for all unknown tasks:
import dask
dask.config.set({'distributed.scheduler.unknown-task-duration': '1h'})

Both should work, report back if it doesnt.

Also be careful with dask-jobqueue and autoscaling, it works better with only one worker process per job!

That seemed to work. I did notice issues with autoscalling and dask-jobqueue and did also set it to one a few days ago.

I will say it does still take a little time for it to ramp up and submit more slurm jobs. But it at least works.

Thanks!

1 Like