New workers are idle while tasks are pending

In our workflow, an HTTP server receives a request containing a payload for processing, starts a Dask cluster, splits the payload into batches and then submits each batch via Dask to the cluster (via fire_and_forget).

The workers are on EC2 (via Kubernetes) so naturally they don’t start up instantly - an instance needs to be found, it needs to download the Docker image, etc.

I just tried a small run which had 7 batches, and spun up a 7-worker cluster to run them. Intriguingly, even though all 7 machines were up and ready, only 3 of them were actually processing - 3 of the tasks were completed in parallel and then the next 3. Looking at the logs of the other 4 workers showed that nothing was happening at all.

I figure the scheduler must somehow be assigning all of these tasks to whichever machine happens to be available at the time they are submitted - but this doesn’t strike me as the right behaviour at all. Surely it’s only when the task is about to start that a worker node should be selected. So if only 3 nodes are up, the task has to wait to begin, but if an idle node is available, execution should start immediately on that node.

Is there any way to get this behaviour?

Hi @NakulK48,

This is right, especially with a small number of tasks.

But to achieve what you want, Dask uses work stealing. If a worker is idle, it should tell it to the Scheduler and steal a task from another worker. So what you’ve got to understand is why this isn’t happening in your case. As a wild guess, I would say that Dask is tune for short time tasks by default, so the Scheduler must consider it’s a waste of time for a worker to take only one tasks from others. Maybe this could be modified by tuning the work stealing feature somehow.