Tasks forgotten waiting for new workers to be allocated

josephmure · June 4, 2025, 8:44am

Second, not sure why I only see 9 results and not ten in your output log, and also the "result = " print is printed only once per value.

Not sure either.

Finally, this use case is really on the edge, couldn’t you overcome this somehow? Launching one job for one task is not a good design with Dask cluster on tob of job queuing system. You should always have some room, and several tasks per Worker/job.

I ended up using submitit based on your advice here: Ensuring Each Dask Task Starts on a New SLURM Job with a Limit of 5 Concurrent Jobs - #2 by guillaumeeb

It fits our usecase better because we need to precisely control the number of tasks per job. Each task takes a “long” time to run and we cannot afford to have tasks run into the SLURM walltime.

I didn’t try to reproduce yet, there might be a problem, but I’m not entirely sure.

The behavior I described in the OP looks like a bug to me, but I can understand not wanting to devote resources to fixing an issue that only occurs in usecases dask.distributed/dask-jobqueue may not really be designed for.

Perhaps the documentation could mention more clearly (unless I missed it?) that Dask is designed to scale large numbers of small tasks, rather than a small/medium number of large tasks.

Topic		Replies	Views
Ensuring Each Dask Task Starts on a New SLURM Job with a Limit of 5 Concurrent Jobs Distributed distributed	2	201	October 27, 2023
Memory allocation always <= 4GiB for distributed SLURMCluster workers Distributed dask-jobqueue , worker , distributed	8	732	July 12, 2022
Redundant scheduling of straggler tasks Distributed	3	202	March 24, 2023
dask_jobqueue.SLURMCluster: multi-threaded workloads and the effect of setting "cores" Distributed dask-jobqueue , distributed	2	255	November 9, 2023
Jobqueue : Workers not killed on time Distributed dask-array , distributed	3	356	January 4, 2022

Tasks forgotten waiting for new workers to be allocated

Related topics