I am using dask on LSF cluster to process some images in parallel. The processing function itself uses joblib to perform multiple computations on the image in parallel.
It seems that setting n_workers
and cores
parameters to some numbers will generally produce n_workers * cores
futures running at the same time. I would like to have n_workers
futures being processesed at a time, each of them having cores
cores at disposal for the purpose of using them with joblib.
How do I achieve such result?
Hi @damiankucharski,
In order to have only n_workers
processes running at the same time, I see two solutions:
However, I’m not sure using joblib from inside dask workers will work well. I would recommend only using Dask, or using joblib on top of Dask as this is done by Scikit Kearn. But I understand that you want one image to be processed on a unique worker which may not be that simple with what I suggest.
2 Likes
Hello @guillaumeeb, thank you for your answer. Could you please provide me with example as for the first point?
Also, I am not sure what you mean by “using joblib on top of Dask as this is done by Scikit-learn”. Does not sklearn just simply use joblib in the most straightforward way to run computations on multiple cores?
OK, so I’m used to PBSCluster
, for LSFCluster
, you want to use ncpus
instead of resources_spec
. You should do something like:
cluster = LSFCluster(cores=1, memory='32GiB', ncpus=8)
This would give you a worker with one process and one thread, but in a job that has bookd 8 cpus.
See dask_jobqueue.LSFCluster — Dask-jobqueue 0.7.4+11.g96e39da.dirty documentation for more options.
For the second point, Dask can be used as a backend of joblib, so joblib sends tasks to a Dask cluster instead of just doing multiprocessing. See a simple example here: Using dask distributed for single-machine parallel computing — joblib 1.2.0.dev0 documentation.
My sentence about Sickit-Learn was a bit wrong, Sickit-Learn uses joblib, and there are lots of example on how to use Dask joblib backend with Scikit Learn.
2 Likes
Oh, I did not realize that you can use both cores and ncpus argument. I thought that these are basically the same parameter. Thank you very much for all your help.