Distributed tasks not starting at the same time

amalibnu · November 13, 2025, 10:36am

Hi, I’m using Dask to run multiple data processing pipelines in parallel to overcome the GIL.

I use Dask delayed to delay each pipeline and use a for loop to run the pipeline.

I have 16 data points that I want to process, and I have created 16 workers, so theoretically, all tasks should run in parallel.

As shown on the attached image, based on the Task Processing section of the dashboard, all 16 workers are processing the tasks.

But when it finished and appeared on the Task Stream section, it observed that not all tasks start at the same time.
So, although all tasks’ processing time is relatively equal, the runtime required is almost (sometimes even more) double the maximum time of a single task.
Does anyone have an idea why it is? And if possible how to fix it?

guillaumeeb · November 14, 2025, 11:39am

Hi @amalibnu,

Dask Distributed start a process for each worker, with several services per process like worker Dashboard. It’s not unusual to see a few seconds delay between the Dask cluster creation and the moment when all workers are up and ready to process a task. If your tasks are only lasting a few seconds, this overhead can be seen as above. I’m not sure if there is a good way to avoid that. If your workflow is really simple, and you don’t want to go distributed, maybe using multiprocessing package is enough?

Topic		Replies	Views
Why are my tasks not executing in parallel? Distributed	3	553	June 11, 2022
One output time vs multiple output time Deploying Dask delayed	1	277	April 19, 2022
Only 1 worker is running when the DAG is forking Distributed	1	183	September 11, 2023
Seemingly more overhead than expected for scheduling tasks Distributed distributed	2	254	January 31, 2023
Dask cluster raises issues about GIL on a horizontally scaled setup Distributed distributed	3	221	December 20, 2023

Distributed tasks not starting at the same time

Related topics