I have a setup where a python script kicks of a slurm cluster of dasks tasks. It is using the concurrent futures interface. I am using the interface adapt(minimum=1, maximum=500). However, it never scales up past 1 dask node. When I use the scale interface it scales find. I do have long running tasks. Does dask need to wait until the first concurrent task returns before scaling up more? What would cause dask to not scale up the cluster/dask nodes.
Hi @M1Sports20, welcome to Dask community!
By default yes, it needs to know how long a single task will last before deciding to autoscale or not.
AFAIK, I think there are two solutions to this:
- Specify roughly the duration of a given function your submitting in the config:
import dask
dask.config.set({'distributed.scheduler.default-task-durations.my_function': '1h'})
- Specify a long duration for all unknown tasks:
import dask
dask.config.set({'distributed.scheduler.unknown-task-duration': '1h'})
Both should work, report back if it doesnt.
Also be careful with dask-jobqueue and autoscaling, it works better with only one worker process per job!
That seemed to work. I did notice issues with autoscalling and dask-jobqueue and did also set it to one a few days ago.
I will say it does still take a little time for it to ramp up and submit more slurm jobs. But it at least works.
Thanks!
1 Like