Duplicated computations using Dask-MPI on multiple nodes of a SLURM HPC

micmitch · February 7, 2024, 3:40am

I’m pretty new to Dask and trying to get a tiled geophysical inversion code running across several nodes of our HPC that has a SLURM scheduler.

I have experimented with different setup configurations and can get the code to run, but the memory load and computation time for each worker is significantly larger than anticipated. To try and troubleshoot these issues I created a toy problem to try and get a better idea of how SLURM and Dask-MPI were behaving/interacting.

Here is a copy of my simple python script:

import os
import platform

import numpy as np
import time

import dask_mpi
import dask.distributed as dd

def run(tileID):

    tileName = 'tile' + str(tileID)
    dd.print('Node:', platform.node(), 'Begin sleeping: ' + tileName + '\n')
    time.sleep(20)
    dd.print('Finished sleeping: ' + tileName)

    # seed random number generator
    np.random.seed(1)
    # generate random numbers between 0-1
    a = np.random.random((10000000, 3))
    b = np.random.random((10000000, 3))
    c = np.multiply(a, b)
    dd.print(tileName + ' shape:', c.shape)


if __name__ == '__main__':

    # Initialize dask-mpi to set up remote workers
    dask_mpi.core.initialize(local_directory=os.getcwd(), nthreads=1, memory_limit="100 GB", interface="ib0")
    print("Initializing Dask-MPI...")

    # Setup Client
    client = dd.Client()
    print('Client information')
    print(client)

    print('Cluster information')
    print(client.cluster)

    nTiles = 8

    time0 = time.time()
    # Run example problem
    futures = []
    for tile in range(0, nTiles):
        print('Tile:', tile)
        future = client.submit(run, tile)
        futures.append(future)


    client.gather(futures)

    time1 = time.time()
    print('Total job time:' + str(time1-time0))

    client.shutdown()

I then run/submit the python script by calling

sbatch Test_DaskMPI_3Node_tpn4_t0.batch

Here is a copy of the batch file

#!/bin/bash

#SBATCH --job-name=Test_DaskMPI_3Node_tpn4_t0
#SBATCH --hint=nomultithread
#SBATCH --mem=100GB
#SBATCH --ntasks-per-node=4
#SBATCH --nodes=3
#SBATCH --exclusive
#SBATCH --reservation=dev
#SBATCH --account=vhp
#SBATCH --time=00:02:00
#SBATCH --output=Test_DaskMPI_3Node_tpn4_t0.log

# module swap PrgEnv-cray PrgEnv-gnu
export OMPI_MCA_btl=^openib

source /home/mamitchell/bin/miniconda3/bin/activate simpeg_main

mpirun -np 12  python -u ./Test_DaskMPI_3Node_tpn4_t0.py | tee ./Test_DaskMPI_3Node_tpn4_t0_output.txt

The hope is that it will evaluate the run function one time for each tile from 0-7 and that these computations will be split over 3 nodes with each node having up to 4 tasks.

Here is the resulting output from the log file:

> 2024-02-06 19:08:18,207 - distributed.scheduler - INFO - State start
2024-02-06 19:08:18,212 - distributed.scheduler - INFO -   Scheduler at:   tcp://172.25.1.51:40095
2024-02-06 19:08:18,212 - distributed.scheduler - INFO -   dashboard at:  http://172.25.1.51:8787/status
2024-02-06 19:08:18,213 - distributed.scheduler - INFO - Registering Worker plugin shuffle
Initializing Dask-MPI...
2024-02-06 19:08:18,295 - distributed.worker - INFO -       Start worker at:   tcp://172.25.1.126:44111
2024-02-06 19:08:18,295 - distributed.worker - INFO -          Listening to:   tcp://172.25.1.126:44111
2024-02-06 19:08:18,295 - distributed.worker - INFO -           Worker name:                          6
2024-02-06 19:08:18,295 - distributed.worker - INFO -          dashboard at:         172.25.1.126:43577
2024-02-06 19:08:18,296 - distributed.worker - INFO - Waiting to connect to:    tcp://172.25.1.51:40095
2024-02-06 19:08:18,296 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:18,296 - distributed.worker - INFO -               Threads:                          1
2024-02-06 19:08:18,296 - distributed.worker - INFO -                Memory:                  93.13 GiB
2024-02-06 19:08:18,296 - distributed.worker - INFO -       Local Directory: /caldera/hovenweep/projects/usgs/hazards/vhp/mamitchell/CLVF/Magnetics/DaskTesting/Dask_MPI/dask-scratch-space/worker-gsg57edz
2024-02-06 19:08:18,296 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:18,296 - distributed.worker - INFO -       Start worker at:    tcp://172.25.1.51:37273
2024-02-06 19:08:18,296 - distributed.worker - INFO -          Listening to:    tcp://172.25.1.51:37273
2024-02-06 19:08:18,296 - distributed.worker - INFO -           Worker name:                          2
2024-02-06 19:08:18,296 - distributed.worker - INFO -          dashboard at:          172.25.1.51:46475
2024-02-06 19:08:18,296 - distributed.worker - INFO - Waiting to connect to:    tcp://172.25.1.51:40095
2024-02-06 19:08:18,297 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:18,296 - distributed.worker - INFO -       Start worker at:   tcp://172.25.1.126:44001
2024-02-06 19:08:18,296 - distributed.worker - INFO -          Listening to:   tcp://172.25.1.126:44001
2024-02-06 19:08:18,296 - distributed.worker - INFO -           Worker name:                          7
2024-02-06 19:08:18,297 - distributed.worker - INFO -          dashboard at:         172.25.1.126:41757
2024-02-06 19:08:18,297 - distributed.worker - INFO -               Threads:                          1
2024-02-06 19:08:18,297 - distributed.worker - INFO - Waiting to connect to:    tcp://172.25.1.51:40095
2024-02-06 19:08:18,297 - distributed.worker - INFO -                Memory:                  93.13 GiB
2024-02-06 19:08:18,297 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:18,297 - distributed.worker - INFO -       Local Directory: /caldera/hovenweep/projects/usgs/hazards/vhp/mamitchell/CLVF/Magnetics/DaskTesting/Dask_MPI/dask-scratch-space/worker-9f_6whjt
2024-02-06 19:08:18,297 - distributed.worker - INFO -               Threads:                          1
2024-02-06 19:08:18,297 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:18,297 - distributed.worker - INFO -                Memory:                  93.13 GiB
2024-02-06 19:08:18,297 - distributed.worker - INFO -       Local Directory: /caldera/hovenweep/projects/usgs/hazards/vhp/mamitchell/CLVF/Magnetics/DaskTesting/Dask_MPI/dask-scratch-space/worker-v8la7m69
2024-02-06 19:08:18,297 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:18,298 - distributed.worker - INFO -       Start worker at:   tcp://172.25.1.142:37673
2024-02-06 19:08:18,298 - distributed.worker - INFO -          Listening to:   tcp://172.25.1.142:37673
2024-02-06 19:08:18,298 - distributed.worker - INFO -           Worker name:                          8
2024-02-06 19:08:18,298 - distributed.worker - INFO -          dashboard at:         172.25.1.142:41843
2024-02-06 19:08:18,298 - distributed.worker - INFO - Waiting to connect to:    tcp://172.25.1.51:40095
2024-02-06 19:08:18,299 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:18,299 - distributed.worker - INFO -       Start worker at:    tcp://172.25.1.51:35759
2024-02-06 19:08:18,298 - distributed.worker - INFO -       Start worker at:   tcp://172.25.1.142:33697
2024-02-06 19:08:18,299 - distributed.worker - INFO -          Listening to:   tcp://172.25.1.142:33697
2024-02-06 19:08:18,299 - distributed.worker - INFO -          Listening to:    tcp://172.25.1.51:35759
2024-02-06 19:08:18,299 - distributed.worker - INFO -           Worker name:                          3
2024-02-06 19:08:18,299 - distributed.worker - INFO -               Threads:                          1
2024-02-06 19:08:18,299 - distributed.worker - INFO -           Worker name:                         11
2024-02-06 19:08:18,299 - distributed.worker - INFO -          dashboard at:          172.25.1.51:45673
2024-02-06 19:08:18,299 - distributed.worker - INFO -                Memory:                  93.13 GiB
2024-02-06 19:08:18,299 - distributed.worker - INFO -       Local Directory: /caldera/hovenweep/projects/usgs/hazards/vhp/mamitchell/CLVF/Magnetics/DaskTesting/Dask_MPI/dask-scratch-space/worker-wkgii4og
2024-02-06 19:08:18,299 - distributed.worker - INFO - Waiting to connect to:    tcp://172.25.1.51:40095
2024-02-06 19:08:18,299 - distributed.worker - INFO -          dashboard at:         172.25.1.142:42547
2024-02-06 19:08:18,299 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:18,299 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:18,299 - distributed.worker - INFO -               Threads:                          1
2024-02-06 19:08:18,299 - distributed.worker - INFO - Waiting to connect to:    tcp://172.25.1.51:40095
2024-02-06 19:08:18,299 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:18,299 - distributed.worker - INFO -       Start worker at:   tcp://172.25.1.126:43337
2024-02-06 19:08:18,299 - distributed.worker - INFO -          Listening to:   tcp://172.25.1.126:43337
2024-02-06 19:08:18,299 - distributed.worker - INFO -               Threads:                          1
2024-02-06 19:08:18,299 - distributed.worker - INFO -           Worker name:                          5
2024-02-06 19:08:18,299 - distributed.worker - INFO -          dashboard at:         172.25.1.126:45521
2024-02-06 19:08:18,299 - distributed.worker - INFO -                Memory:                  93.13 GiB
2024-02-06 19:08:18,299 - distributed.worker - INFO -       Local Directory: /caldera/hovenweep/projects/usgs/hazards/vhp/mamitchell/CLVF/Magnetics/DaskTesting/Dask_MPI/dask-scratch-space/worker-vbf5s_j3
2024-02-06 19:08:18,299 - distributed.worker - INFO -                Memory:                  93.13 GiB
2024-02-06 19:08:18,299 - distributed.worker - INFO - Waiting to connect to:    tcp://172.25.1.51:40095
2024-02-06 19:08:18,299 - distributed.worker - INFO -       Start worker at:   tcp://172.25.1.142:38005
2024-02-06 19:08:18,299 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:18,299 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:18,299 - distributed.worker - INFO -       Local Directory: /caldera/hovenweep/projects/usgs/hazards/vhp/mamitchell/CLVF/Magnetics/DaskTesting/Dask_MPI/dask-scratch-space/worker-if64_68p
2024-02-06 19:08:18,299 - distributed.worker - INFO -               Threads:                          1
2024-02-06 19:08:18,299 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:18,299 - distributed.worker - INFO -          Listening to:   tcp://172.25.1.142:38005
2024-02-06 19:08:18,299 - distributed.worker - INFO -                Memory:                  93.13 GiB
2024-02-06 19:08:18,299 - distributed.worker - INFO -           Worker name:                          9
2024-02-06 19:08:18,299 - distributed.worker - INFO -       Local Directory: /caldera/hovenweep/projects/usgs/hazards/vhp/mamitchell/CLVF/Magnetics/DaskTesting/Dask_MPI/dask-scratch-space/worker-e6y6fw8l
2024-02-06 19:08:18,299 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:18,299 - distributed.worker - INFO -          dashboard at:         172.25.1.142:38933
2024-02-06 19:08:18,299 - distributed.worker - INFO - Waiting to connect to:    tcp://172.25.1.51:40095
2024-02-06 19:08:18,299 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:18,299 - distributed.worker - INFO -               Threads:                          1
2024-02-06 19:08:18,299 - distributed.worker - INFO -                Memory:                  93.13 GiB
2024-02-06 19:08:18,299 - distributed.worker - INFO -       Start worker at:   tcp://172.25.1.142:44753
2024-02-06 19:08:18,299 - distributed.worker - INFO -       Local Directory: /caldera/hovenweep/projects/usgs/hazards/vhp/mamitchell/CLVF/Magnetics/DaskTesting/Dask_MPI/dask-scratch-space/worker-v_ea1ko_
2024-02-06 19:08:18,299 - distributed.worker - INFO -          Listening to:   tcp://172.25.1.142:44753
2024-02-06 19:08:18,299 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:18,299 - distributed.worker - INFO -           Worker name:                         10
2024-02-06 19:08:18,299 - distributed.worker - INFO -          dashboard at:         172.25.1.142:36855
2024-02-06 19:08:18,300 - distributed.worker - INFO - Waiting to connect to:    tcp://172.25.1.51:40095
2024-02-06 19:08:18,300 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:18,300 - distributed.worker - INFO -               Threads:                          1
2024-02-06 19:08:18,300 - distributed.worker - INFO -                Memory:                  93.13 GiB
2024-02-06 19:08:18,300 - distributed.worker - INFO -       Local Directory: /caldera/hovenweep/projects/usgs/hazards/vhp/mamitchell/CLVF/Magnetics/DaskTesting/Dask_MPI/dask-scratch-space/worker-xl1hkheh
2024-02-06 19:08:18,300 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:18,300 - distributed.worker - INFO -       Start worker at:   tcp://172.25.1.126:44497
2024-02-06 19:08:18,300 - distributed.worker - INFO -          Listening to:   tcp://172.25.1.126:44497
2024-02-06 19:08:18,300 - distributed.worker - INFO -           Worker name:                          4
2024-02-06 19:08:18,300 - distributed.worker - INFO -          dashboard at:         172.25.1.126:38063
2024-02-06 19:08:18,300 - distributed.worker - INFO - Waiting to connect to:    tcp://172.25.1.51:40095
2024-02-06 19:08:18,300 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:18,301 - distributed.worker - INFO -               Threads:                          1
2024-02-06 19:08:18,301 - distributed.worker - INFO -                Memory:                  93.13 GiB
2024-02-06 19:08:18,301 - distributed.worker - INFO -       Local Directory: /caldera/hovenweep/projects/usgs/hazards/vhp/mamitchell/CLVF/Magnetics/DaskTesting/Dask_MPI/dask-scratch-space/worker-35u6hsfz
2024-02-06 19:08:18,301 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:18,877 - distributed.scheduler - INFO - Register worker <WorkerState 'tcp://172.25.1.126:44497', name: 4, status: init, memory: 0, processing: 0>
2024-02-06 19:08:19,461 - distributed.scheduler - INFO - Starting worker compute stream, tcp://172.25.1.126:44497
2024-02-06 19:08:19,461 - distributed.core - INFO - Starting established connection to tcp://172.25.1.126:48092
2024-02-06 19:08:19,462 - distributed.scheduler - INFO - Register worker <WorkerState 'tcp://172.25.1.126:44111', name: 6, status: init, memory: 0, processing: 0>
2024-02-06 19:08:19,462 - distributed.worker - INFO - Starting Worker plugin shuffle
2024-02-06 19:08:19,462 - distributed.scheduler - INFO - Starting worker compute stream, tcp://172.25.1.126:44111
2024-02-06 19:08:19,462 - distributed.core - INFO - Starting established connection to tcp://172.25.1.126:48066
2024-02-06 19:08:19,462 - distributed.worker - INFO -         Registered to:    tcp://172.25.1.51:40095
2024-02-06 19:08:19,463 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:19,463 - distributed.core - INFO - Starting established connection to tcp://172.25.1.51:40095
2024-02-06 19:08:19,463 - distributed.scheduler - INFO - Register worker <WorkerState 'tcp://172.25.1.142:37673', name: 8, status: init, memory: 0, processing: 0>
2024-02-06 19:08:19,463 - distributed.scheduler - INFO - Starting worker compute stream, tcp://172.25.1.142:37673
2024-02-06 19:08:19,463 - distributed.core - INFO - Starting established connection to tcp://172.25.1.142:60056
2024-02-06 19:08:19,463 - distributed.worker - INFO - Starting Worker plugin shuffle
2024-02-06 19:08:19,464 - distributed.scheduler - INFO - Register worker <WorkerState 'tcp://172.25.1.126:43337', name: 5, status: init, memory: 0, processing: 0>
2024-02-06 19:08:19,464 - distributed.worker - INFO -         Registered to:    tcp://172.25.1.51:40095
2024-02-06 19:08:19,464 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:19,464 - distributed.scheduler - INFO - Starting worker compute stream, tcp://172.25.1.126:43337
2024-02-06 19:08:19,464 - distributed.core - INFO - Starting established connection to tcp://172.25.1.126:48080
2024-02-06 19:08:19,464 - distributed.core - INFO - Starting established connection to tcp://172.25.1.51:40095
2024-02-06 19:08:19,464 - distributed.worker - INFO - Starting Worker plugin shuffle
2024-02-06 19:08:19,465 - distributed.scheduler - INFO - Register worker <WorkerState 'tcp://172.25.1.142:44753', name: 10, status: init, memory: 0, processing: 0>
2024-02-06 19:08:19,465 - distributed.worker - INFO -         Registered to:    tcp://172.25.1.51:40095
2024-02-06 19:08:19,465 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:19,465 - distributed.worker - INFO - Starting Worker plugin shuffle
2024-02-06 19:08:19,465 - distributed.scheduler - INFO - Starting worker compute stream, tcp://172.25.1.142:44753
2024-02-06 19:08:19,465 - distributed.core - INFO - Starting established connection to tcp://172.25.1.142:60094
2024-02-06 19:08:19,465 - distributed.core - INFO - Starting established connection to tcp://172.25.1.51:40095
2024-02-06 19:08:19,466 - distributed.worker - INFO -         Registered to:    tcp://172.25.1.51:40095
2024-02-06 19:08:19,466 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:19,466 - distributed.scheduler - INFO - Register worker <WorkerState 'tcp://172.25.1.126:44001', name: 7, status: init, memory: 0, processing: 0>
2024-02-06 19:08:19,466 - distributed.core - INFO - Starting established connection to tcp://172.25.1.51:40095
2024-02-06 19:08:19,466 - distributed.scheduler - INFO - Starting worker compute stream, tcp://172.25.1.126:44001
2024-02-06 19:08:19,466 - distributed.core - INFO - Starting established connection to tcp://172.25.1.126:48068
2024-02-06 19:08:19,466 - distributed.worker - INFO - Starting Worker plugin shuffle
2024-02-06 19:08:19,467 - distributed.scheduler - INFO - Register worker <WorkerState 'tcp://172.25.1.142:38005', name: 9, status: init, memory: 0, processing: 0>
2024-02-06 19:08:19,467 - distributed.scheduler - INFO - Starting worker compute stream, tcp://172.25.1.142:38005
2024-02-06 19:08:19,467 - distributed.core - INFO - Starting established connection to tcp://172.25.1.142:60080
2024-02-06 19:08:19,467 - distributed.worker - INFO -         Registered to:    tcp://172.25.1.51:40095
2024-02-06 19:08:19,467 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:19,467 - distributed.worker - INFO - Starting Worker plugin shuffle
2024-02-06 19:08:19,467 - distributed.core - INFO - Starting established connection to tcp://172.25.1.51:40095
2024-02-06 19:08:19,467 - distributed.scheduler - INFO - Register worker <WorkerState 'tcp://172.25.1.142:33697', name: 11, status: init, memory: 0, processing: 0>
2024-02-06 19:08:19,468 - distributed.scheduler - INFO - Starting worker compute stream, tcp://172.25.1.142:33697
2024-02-06 19:08:19,468 - distributed.worker - INFO -         Registered to:    tcp://172.25.1.51:40095
2024-02-06 19:08:19,468 - distributed.core - INFO - Starting established connection to tcp://172.25.1.142:60070
2024-02-06 19:08:19,468 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:19,468 - distributed.worker - INFO - Starting Worker plugin shuffle
2024-02-06 19:08:19,468 - distributed.core - INFO - Starting established connection to tcp://172.25.1.51:40095
2024-02-06 19:08:19,468 - distributed.scheduler - INFO - Register worker <WorkerState 'tcp://172.25.1.51:35759', name: 3, status: init, memory: 0, processing: 0>
2024-02-06 19:08:19,468 - distributed.worker - INFO -         Registered to:    tcp://172.25.1.51:40095
2024-02-06 19:08:19,469 - distributed.scheduler - INFO - Starting worker compute stream, tcp://172.25.1.51:35759
2024-02-06 19:08:19,469 - distributed.core - INFO - Starting established connection to tcp://172.25.1.51:60400
2024-02-06 19:08:19,468 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:19,468 - distributed.worker - INFO - Starting Worker plugin shuffle
2024-02-06 19:08:19,469 - distributed.core - INFO - Starting established connection to tcp://172.25.1.51:40095
2024-02-06 19:08:19,469 - distributed.scheduler - INFO - Register worker <WorkerState 'tcp://172.25.1.51:37273', name: 2, status: init, memory: 0, processing: 0>
2024-02-06 19:08:19,469 - distributed.scheduler - INFO - Starting worker compute stream, tcp://172.25.1.51:37273
2024-02-06 19:08:19,469 - distributed.worker - INFO -         Registered to:    tcp://172.25.1.51:40095
2024-02-06 19:08:19,469 - distributed.core - INFO - Starting established connection to tcp://172.25.1.51:60392
2024-02-06 19:08:19,469 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:19,469 - distributed.worker - INFO - Starting Worker plugin shuffle
2024-02-06 19:08:19,470 - distributed.scheduler - INFO - Receive client connection: Client-60d54dba-c555-11ee-88b9-000005c1fe80
2024-02-06 19:08:19,470 - distributed.core - INFO - Starting established connection to tcp://172.25.1.51:40095
2024-02-06 19:08:19,470 - distributed.core - INFO - Starting established connection to tcp://172.25.1.51:60368
2024-02-06 19:08:19,470 - distributed.worker - INFO - Starting Worker plugin shuffle
2024-02-06 19:08:19,470 - distributed.worker - INFO -         Registered to:    tcp://172.25.1.51:40095
2024-02-06 19:08:19,470 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:19,470 - distributed.core - INFO - Starting established connection to tcp://172.25.1.51:40095
2024-02-06 19:08:19,470 - distributed.worker - INFO -         Registered to:    tcp://172.25.1.51:40095
2024-02-06 19:08:19,471 - distributed.worker - INFO - -------------------------------------------------
2024-02-06 19:08:19,471 - distributed.core - INFO - Starting established connection to tcp://172.25.1.51:40095
Client information
<Client: 'tcp://172.25.1.51:40095' processes=0 threads=0, memory=0 B>
Cluster information
None
Tile: 0
Tile: 1
Tile: 2
Tile: 3
Tile: 4
Tile: 5
Tile: 6
Tile: 7
Node: cn114 Begin sleeping: tile0

Node: cn114 Begin sleeping: tile1

Node:Node: cn114 Begin sleeping: tile0

 cn130 Begin sleeping: tile2

Node: cn130 Begin sleeping: tile3

Node: cn114 Begin sleeping: tile1

Node: cn130 Begin sleeping: tile2

Node: cn039 Begin sleeping: tile4

Node: cn114 Begin sleeping: tile5

Node: cn130 Begin sleeping: tile3

Node: cn039 Begin sleeping: tile4

Node: cn130 Begin sleeping: tile6

Node: cn130Node: cn114 Begin sleeping: tile5

Node: cn130 Begin sleeping: tile6

 Begin sleeping: tile7

Node: cn130 Begin sleeping: tile7

Finished sleeping: tile0
Finished sleeping: tile0
Finished sleeping: tile1
Finished sleeping: tile1
Finished sleeping: tile2
Finished sleeping: tile3
Finished sleeping: tile2
Finished sleeping: tile4
Finished sleeping: tile3
Finished sleeping: tile4
Finished sleeping: tile5
Finished sleeping: tile5
Finished sleeping: tile6
Finished sleeping: tile7
Finished sleeping: tile6
Finished sleeping: tile7
tile0 shape: (10000000, 3)
tile1 shape: (10000000, 3)
tile0 shape: (10000000, 3)
tile3 shape: (10000000, 3)
tile2 shape: (10000000, 3)
tile1 shape: (10000000, 3)
tile3 shape: (10000000, 3)
tile2 shape: (10000000, 3)
tile5 shape: (10000000, 3)
tile5 shape: (10000000, 3)
tile4 shape: (10000000, 3)
tile4 shape: (10000000, 3)
tile7 shape: (10000000, 3)
tile7 shape: (10000000, 3)
tile6 shape: (10000000, 3)
tile6 shape: (10000000, 3)
Total job time:20.840075492858887

....... Lots of shutting down messages........

I don’t understand why the run function appears to have been evaluated twice for each of the 8 “tiles”.
The computations were divided across the 3 allocated nodes in the following manner.

Node cn114 ran tiles: 0, 1, 0, 1, 5, 5
Node cn130 ran tiles: 2, 3, 2, 3, 6, 7, 6, 7
Node cn039 ran tiles: 4, 4

From a smaller example with only 2 nodes I thought that the entire script was just being run independently on each node. However, the test shown above using 3 nodes shows that this is not the case since I would have expected each node to run tiles 0-7.

In addition to running the python script using

mpirun -np 12 python ./Test_DaskMPI_3Node_tpn4_t0.py

I have also tried

mpirun -np 12 -npernode 4 python ./Test_DaskMPI_3Node_tpn4_t0.py

and

srun --mpi=pmix python ./Test_DaskMPI_3Node_tpn4_t0.py

All of these variations produce the same results.

The only idea I have left is trying to use the worker_options within the dask_mpi.initialize call to specify the number of workers. However, I haven’t been able to find much information about the available options.

I expect that I am doing something silly in the configuration but I’m kind of out of ideas. Any thoughts or suggestions would be greatly appreciated! Thanks.

guillaumeeb · February 7, 2024, 2:49pm

Hi @micmitch, welcome to Dask community!

I suspect the issue is comming from using the dd.print function, as stated in the documentation:

If called by code running on a worker, then in addition to printing locally, any clients connected (possibly remotely) to the scheduler managing this worker will receive an event instructing them to print the same output

I’m wondering if you are seeing on the output logs both the print from the Worker and from the Client. Could you try to use a simple print?

Also, just to make sure you’ve properly inderstoof how many concurrent workers you’ll get with your script, as stated in Dask-MPI with Batch Jobs — Dask-MPI 2022.4.0+35.g3a7f045.dirty documentation

The initialize() function, when run from within an MPI environment (i.e., created by the use of mpirun or mpiexec), launches the Dask Scheduler on MPI rank 0 and the Dask Workers on MPI ranks 2 and above. On MPI rank 1, the initialize() function “passes through” to the Client script, running the Dask-based Client code the user wishes to execute.

So you’ll have only 10 concurrent tasks execution on yor nodes.

micmitch · February 9, 2024, 5:53am

Hi Guillaume. Thanks for getting back to me so quickly! Using the standard print command did indeed fix the double printing confusion in my example script. Thanks for pointing me in the right direction.

I looks like I’ll have to go back to the drawing board to try and better understand the behavior of the full scale inversion I’m trying to run since my setup doesn’t appear to be duplicating computations.

Topic		Replies	Views
Dask-mpi and mpirun - Can't scale up run with larger files and or more core/node on HPC cluster Distributed delayed , future , distributed	2	40	September 20, 2024
Dask deployment on SLURM Cluster with GPUs Deploying Dask dask-mpi , distributed	7	319	May 20, 2024
Run a single task per worker with dask-mpi Distributed dask-mpi	1	770	September 29, 2022
Parallelisation by multiprocessing not multithreading on SLURMCluster Distributed	1	317	April 23, 2022
dask_jobqueue.SLURMCluster: multi-threaded workloads and the effect of setting "cores" Distributed dask-jobqueue , distributed	2	241	November 9, 2023

Duplicated computations using Dask-MPI on multiple nodes of a SLURM HPC

Related topics