Hi, I’m using Dask to run multiple data processing pipelines in parallel to overcome the GIL.
I use Dask delayed to delay each pipeline and use a for loop to run the pipeline.
I have 16 data points that I want to process, and I have created 16 workers, so theoretically, all tasks should run in parallel.
As shown on the attached image, based on the Task Processing section of the dashboard, all 16 workers are processing the tasks.
But when it finished and appeared on the Task Stream section, it observed that not all tasks start at the same time.
So, although all tasks’ processing time is relatively equal, the runtime required is almost (sometimes even more) double the maximum time of a single task.
Does anyone have an idea why it is? And if possible how to fix it?
