Run a single task per worker with dask-mpi

Hi all,

I would like to enforce that each worker only processes one single task at a time. In this case, each worker is a separate node from a HPC cluster with SLURM. The reason for this is that I would like to perform a memory-intensive computation and running two of those tasks in parallel on a single node might exceed the memory limits.

I’m currently using dask-mpi since it allows me to request multiple nodes at once but I didn’t yet manage to make sure that only one task is executed per node. My SLURM file for a batch job contains these entries (among others):

#SBATCH --nodes=5
#SBATCH --ntasks-per-core=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=36

The way that I initialize my workers with dask-mpi is as follows:

from dask.distributed import Client
from dask_mpi import initialize

initialize(nthreads=36)
client = Client()

This setup works in principle but will still process 2 tasks per worker when there’s a total of 10 concurrent tasks. I tried to implement the recommendations from here as follows but without any success:

with dask.config.set({"distributed.worker.resources.TEST": 1}):
    initialize(nthreads=36)
    client = Client()

...

dask.compute(results, optimize_graph=False, resources={"TEST": 1})

Does anyone know how to make this work with dask-mpi? Any help is greatly appreciated!

Hi @Alejandro and welcome here!

I think the simple answer to your question is:

from dask.distributed import Client
from dask_mpi import initialize

initialize(nthreads=1) #Should also be equivalent to initialize() with no args
client = Client()

Setting threads=36 means each of your workers will launch tasks on these 36 threads.

I think you don’t want to mess-up with resources yet, except if you now at some point you’ll need finer control on the number of tasks per node. But I don’t recommend it right now.

Also, not sure you noticed, but without more arguments/code than what you gave, you’ll have MPI rank 0 dedicated to Scheduler, MPI rank 1 for your Python client process, and only 3 workers on rank 2, 3 and 4!

Best,
Guillaume.

1 Like