GPU memory within container

secrettoad · July 28, 2023, 9:14pm

Hi dask team,

I have question that I imaging there is an answer to, although I can’t seem to find it anywhere. I’m running a dask cluster in GKE with nodes that have nvidia GPUs attached to them. I am successfully using the gpus in a distributed way by pushing pytorch models and datasets to the cuda devices, but I am struggling to monitor GPU memory usage.

I see charts in the diagnostic dashboard labelled gpu utilization and gpu memory, but they are completely blank for me, even when running dask-worker containers on them.

My question is, where do those/does dask look for gpu utilization metrics? My guess is that this is an issue with dask inside the container not having sufficient access to the GPU in order to report on it, but without knowing where it is looking I’m not sure where to start debugging.

Thank you in advance for your help.

guillaumeeb · August 1, 2023, 8:31pm

Hi @secrettoad,

I did a quick search on the code, and Dask uses pynvml to get GPU utilization metrics. You can find some code here: https://github.com/dask/distributed/blob/405c011919bc7176bef8451be02578ca15931110/distributed/worker.py#L3327.

Then the Dashboard just queries these metrics: https://github.com/dask/distributed/blob/405c011919bc7176bef8451be02578ca15931110/distributed/dashboard/components/nvml.py#L131.

Hope that helps.

jacobtomlinson · August 2, 2023, 8:35am

Make sure you are launching your workers with dask-cuda-worker and have the dask-cuda package installed in the worker containers.

secrettoad · August 2, 2023, 5:28pm

@jacobtomlinson thank you! i did not realize that was necessary

Topic		Replies	Views
How to efficiently monitor GPU usage without a dashboard? Distributed gpu	3	514	July 19, 2024
How to monitor GPU usage using dashboard dashboard , gpu	3	45	August 21, 2024
Why am I seeing 16 GiB memory when I have an Nvidia Geforce GTX 1650 GPU?	6	232	June 4, 2023
Multi-GPU dask gateway pods Deploying Dask	5	149	December 1, 2023
Dask Arrays with TensorFlow Dask Array dask-array , distributed	3	1142	August 5, 2022

GPU memory within container

Related topics