How to efficiently monitor GPU usage without a dashboard?

jcfaracco · April 1, 2023, 7:15pm

Hello folks,

I’m trying to do a calculation of a 30 GB inside 4 clustered GPUs.
Even if I split this data into small chunks of 100 MB, the memory increase so much that it reports allocation issues.
The point is… How can I efficiently profile the GPU memory usage of my process? For further information, I’m using CuPy and Dask Arrays.
If I use only CPU and local memory, I could easily use the dask-memusage plugin, but unfortunately, it does not work with GPUs.

I’m not using the dashboard because I’m running on a cluster that does not let me open ports externally.

Any thoughts and suggestions are welcome.

guillaumeeb · April 3, 2023, 9:44am

Hi @jcfaracco,

Did you go through
https://distributed.dask.org/en/stable/diagnosing-performance.html

or
https://docs.dask.org/en/stable/diagnostics-distributed.html

There might be some useful tools here like performance_report or MemorySampler for example. I’m not sure how it goes along with GPUs.

The only other solution I see is using external tooling like nvidia-smi (there might be some package that are able to record output of this command).

You could also try without GPUs and see how it goes.

guillaumeeb · April 7, 2023, 5:34am

Also, did you tried to use SSH port forwarding?

jcfaracco · July 19, 2024, 10:01pm

I wrote a similar plugin to dask-memusage. If anyone is interested: GitHub - discovery-unicamp/dask-memusage-gpus: A thread based and low-impact GPU memory profiler for Dask.

It is missing documentation, but this is something I will do in the next weeks.

Topic		Replies	Views
GPU memory within container Deploying Dask	3	242	August 2, 2023
How to monitor GPU usage using dashboard dashboard , gpu	3	50	August 21, 2024
Optimal way to monitor GPU memory usage during distributed training (XGBoost) Distributed	4	22	July 18, 2025
Tracking and Storing Memory Usage Per Task Distributed distributed , performance	6	40	July 31, 2024
API to access diagnose dashboard data	2	184	May 11, 2022

How to efficiently monitor GPU usage without a dashboard?

Related topics