I am new to Dask and I would like to understand something before committing to it.
Context: I am the developer of a python based software called GridCal that deals with power systems simulation and optimization. Some of those topics may benefit greatly of parallelism. However in pure python, whenever you create a new thread you are essentially cloning the python interpreter.
Example: Say that I have in memory 10 GB of power system data that I want to simulate in chunks. I know that each simulation “chunk” takes 1GB.
In C++ due to shared-memory parallelism, if I run 5 concurrent chunks, the total memory usage is 10 GB + 5 x 1 GB = 15 GB. This is very nice for same-machine parallelism.
In python, the same process, due the inability to use shared-memory parallelism (thanks to the GIL), the total memory usage for the same example wold be 5 x (10 + 1) GB = 55 GB
Is parallelism with Dask the same as Python?
This is ok if I run the tasks in separated machines.
Hi @guillaumeeb, I assume that the confusion comes from the following;
A lot of common uses of parallelization have to do with dense matrix operations, neural networks and the like. For those cases, the parallel function is usually very simple, think matrix multiplication. These operations align well with the CPU cache and benefit from on-chip operations. Usually you can get by without using auxiliary structures for the computation.
In my case, the parallel function is an optimization process, and as such it is not practical to allocate all the memory that it is going to use in advance, so the allocation happens dynamically. This is how the OS runs MS office and Chrome at the same time; Two different complex processes run in parallel.
How I do things in C++ is to store the input data in memory, split the input data in batches and run them in different threads, from which I collect the results aligning them in the same order as the inputs. When I tried this in python years ago using multiprocessing, for each thread, not only the simulation memory was allocated but a copy of the whole input memory as well. That is unacceptable because most PC’s cannot run simulations due to memory restrictions and that is why I went the C++ route; In C++ you can share the input memory and in python you cannot.
I hope this clarifies my question. My hope is that dask can share the input memory among the threads without copying it completely.
All scientific libraries in Python (numpy, scipy, pandas, etc.) work very hard to release the GIL as much as they can, so that you can have multiple threads per process. You will still have some GIL contention which makes it unwise to have processes with e.g. 32 threads. Dask offers tools to monitor how much GIL contention you have.