Tasks forgotten waiting for new workers to be allocated

Second, not sure why I only see 9 results and not ten in your output log, and also the "result = " print is printed only once per value.

Not sure either.

Finally, this use case is really on the edge, couldn’t you overcome this somehow? Launching one job for one task is not a good design with Dask cluster on tob of job queuing system. You should always have some room, and several tasks per Worker/job.

I ended up using submitit based on your advice here: Ensuring Each Dask Task Starts on a New SLURM Job with a Limit of 5 Concurrent Jobs - #2 by guillaumeeb

It fits our usecase better because we need to precisely control the number of tasks per job. Each task takes a “long” time to run and we cannot afford to have tasks run into the SLURM walltime.

I didn’t try to reproduce yet, there might be a problem, but I’m not entirely sure.

The behavior I described in the OP looks like a bug to me, but I can understand not wanting to devote resources to fixing an issue that only occurs in usecases dask.distributed/dask-jobqueue may not really be designed for.

Perhaps the documentation could mention more clearly (unless I missed it?) that Dask is designed to scale large numbers of small tasks, rather than a small/medium number of large tasks.