In Multi-Threading, how does Dask avoid GIL to improve threading performance?
Hi @habout632,
Dask does not avoid the GIL when doing multi-threading. If the code your launching in your tasks does not release it, then you won’t benefit of parallelization, and should use multiprocessing mode instead.
Hi @habout632,
If your code requires the GIL to be locked most of the time, you have two options:
-
start only 1 thread per worker (potentially, multiple workers per host): e.g.
dask worker myscheduler:8786 --nthreads 1 --nworkers 2
Be aware that your task will still be contending the GIL with the network stack of the worker, which may cause timeouts, particularly if you have long-running tasks. -
start multiple threads per worker, and attach a ProcessPoolExecutor to your workers: Worker — Dask.distributed 2023.5.0 documentation (see
executor
parameter).
The latter method is particularly beneficial if only a few tasks require the GIL, as you can choose which executor to use through annotations: API Reference — Dask documentation