In our workflow, an HTTP server receives a request containing a payload for processing, starts a Dask cluster, splits the payload into batches and then submits each batch via Dask to the cluster (via fire_and_forget
).
The workers are on EC2 (via Kubernetes) so naturally they don’t start up instantly - an instance needs to be found, it needs to download the Docker image, etc.
I just tried a small run which had 7 batches, and spun up a 7-worker cluster to run them. Intriguingly, even though all 7 machines were up and ready, only 3 of them were actually processing - 3 of the tasks were completed in parallel and then the next 3. Looking at the logs of the other 4 workers showed that nothing was happening at all.
I figure the scheduler must somehow be assigning all of these tasks to whichever machine happens to be available at the time they are submitted - but this doesn’t strike me as the right behaviour at all. Surely it’s only when the task is about to start that a worker node should be selected. So if only 3 nodes are up, the task has to wait to begin, but if an idle node is available, execution should start immediately on that node.
Is there any way to get this behaviour?