Submit MPI job to a single dask worker spanning across multiple nodes

llodds · December 1, 2023, 4:59pm

Hi, I have a quick question, is it possible to span a single dask worker across multiple nodes and embed MPI for communication between nodes? I need to handle a problem with many independent tasks and each task needs memory from more than two machines with data communication in between. I am hoping still using MPI for data communication, and use dask workers (SGECluster with adaptive(), I really like the adaptive/fault-tolerance feature, compared to dask-mpi approach where slots are fixed) to handle those independent tasks. But I don’t know how to submit an MPI job to a single dask worker through submit()? Is it do-able and do you have an example? Or alternative solution is also appreciated. Thanks in advance.

guillaumeeb · December 1, 2023, 5:47pm

Hi @llodds,

I’m not sure I understandn your question. A Dask Worker is a single process running on a single node. It cannot handle memory from several nodes. Unless you want to submit MPI applications from a Dask Worker?

Couldn’t you just use job arrays if your tasks are independent?

llodds · December 1, 2023, 6:09pm

@guillaumeeb

Yes I want to submit MPI application from a single dask worker. The MPI application need to span across multiple nodes due to memory requirement. Results from those tasks are further integrated into other complicated scientific workflow (iterations of gradient update, each gradient update is made of combining results from multiple independent tasks) which I hope I will built using Python not bash with job arrays.

guillaumeeb · December 6, 2023, 2:54pm

From what I understand, this sounds a bit complicated. You want to launch a Dask Cluster using dask-jobqueue, submiting jobs to start Workers, and you submit tasks to this cluster which will start MPI jobs through SGE?

This might be doable, but I’m not sure of the use of the Dask Cluster, and this will involve a lot of spawning processes to start the jobs.

Couldn’t you just use a normal Python interpreter without Dask to submit the MPI jobs and retrieve the results?

Maybe with some code sample this would be clearer.

Topic		Replies	Views
Run a single task per worker with dask-mpi Distributed dask-mpi	1	778	September 29, 2022
When should you use Dask-MPI vs dask-jobqueue? Deploying Dask dask-jobqueue , dask-mpi	5	579	February 17, 2022
Restrict task graph to one worker in a distributed cluster Distributed	0	158	October 4, 2022
Shared in-memory data for workers on same machine? Distributed	3	1053	February 7, 2022
Running eactly one task per DASK worker Distributed	1	244	April 22, 2023

Submit MPI job to a single dask worker spanning across multiple nodes

Related topics