How to get worker (numeric) ID or processes (numeric) ID in a node?

llodds · October 27, 2022, 5:17am

I am submitting jobs to GPU node and each node will be fit by 4 jobs, so each job will see different set of GPUs: (0,1), (2,3), (4,5), (6,7). I need process ID or worker ID in numbers to determine the visible GPUs for each submitted job. How do I get these information? get_worker().id only gives encrypted worker ID in letters. Thanks.

jrbourbeau · November 10, 2022, 10:30pm

Thanks for the question @llodds. IIUC you should be able to get what you want by running os.getpid() on each worker like this

In [1]: from distributed import Client

In [2]: c = Client()

In [3]: def get_pid():
   ...:     import os
   ...:     return os.getpid()
   ...:

In [4]: c.run(get_pid)
Out[4]:
{'tcp://127.0.0.1:64324': 93100,
 'tcp://127.0.0.1:64325': 93099,
 'tcp://127.0.0.1:64326': 93101,
 'tcp://127.0.0.1:64333': 93102}

llodds · December 20, 2022, 7:22pm

@jrbourbeau Thanks for the reply but PID is not what I am looking for. I am looking for a relative rank within a compute node. For example, if the scheduler assigns 4 workers to a node, then GPU#0,1 is visible to worker#0 only, GPU#2,3 is visible to worker#1 only, …, GPU#6,7 is visible to worker#3 only. PID doesn’t give this “relative rank” information. I could do some hack here: getting all PIDs and then find the relative rank based on PID value, but in case of adaptive deployments with number of workers enlarge and reduce, it’s hard to capture this “relative rank” information manually, as the “relative rank” may change all the time based on PID values, so I am hoping dask can expose some built-in feature to extract this “relative rank” information. Thanks.

llodds · September 14, 2023, 11:41pm

I don’t think I will need any solution on this thread as our center is moving to SLURM, with which we can ask for partial resource with partial GPU resource (great!)

Topic		Replies	Views
Some gpus are idling when running dask-mpi on HPC Distributed dask-mpi	7	424	April 24, 2023
Accessing worker state on dask-gateway Distributed dask-gateway	4	616	January 19, 2022
Tuning Distributed Dask Clusters with GPUs Distributed dask-gateway , distributed	3	994	February 21, 2022
Get host:port of additional worker groups Distributed	3	166	August 25, 2022
Client.submit() only running the code on one of the workers Distributed	3	851	March 22, 2022

How to get worker (numeric) ID or processes (numeric) ID in a node?

Related topics