I am new to dask and I am trying to work with dask-mpi using scripts. I might have forgotten to do something, but I am trying to run a simple script that opens some csv files and the do a dummy operation on them, just to try. The problem is that when I try to call .compute(), the operation does not get done and the script just freezes and it does not finish. This is the script I am trying to run:
from mpi4py import MPI comm = MPI.COMM_WORLD from dask_mpi import initialize initialize(comm = comm) from distributed.scheduler import logger from distributed import Client from dask.dataframe import read_csv def main(): with Client() as cl: logger.info(cl) df = read_csv(".user/data/CMaps/train_*.txt", sep=" ", header= None) df = df +1 logger.info(df.compute().head()) # cl.close() if __name__ == "__main__": main()
I am running the script with
mpirun -np 2 python main.py
My environment is:
macOS Ventura 13.0
I also tried to run mpirun with MPICH with no better result.
I also get this output when running:
2022-11-11 15:55:10,164 - distributed.scheduler - INFO - State start 2022-11-11 15:55:10,166 - distributed.scheduler - INFO - Scheduler at: tcp://192.168.0.61:56661 2022-11-11 15:55:10,166 - distributed.scheduler - INFO - dashboard at: :8787 2022-11-11 15:55:10,506 - distributed.scheduler - INFO - Receive client connection: Client-d6e71920-61d0-11ed-aaf5-9e3aaa5630a4 2022-11-11 15:55:10,511 - distributed.core - INFO - Starting established connection 2022-11-11 15:55:10,513 - distributed.scheduler - INFO - <Client: 'tcp://192.168.0.61:56661' processes=0 threads=0, memory=0 B>
I hope I did not miss something while setting up everything, I just installed everything through pip. I am guessing the problem is related with the fact the client not showing any processes nor threads running. Maybe dask-mpi is not meant for scripts (?) However, I could not find any documentation that would help me setting this up. So I hope any of you can help me out.