Optimal way to monitor GPU memory usage during distributed training (XGBoost)

Hi @ap213, welcome to Dask Discourse forum!

First, I would like to be sure you are really launching computations on GPUs, are I don’t see any hints of that in your code. Are you configuring something, somewhere, to be sure that the code is running on GPUs? From the code I see, you are creating standard Dask array, so they would be held in server main memory and using CPUs, creating a LocalCUDACluster is not enough, but maybe you just didn’t put some part of the code.

To be more specific, you should use cupy or use it as a backend, as in XGBoost example:

with Client(cluster) as client, dask.config.set({"array.backend": "cupy"}):

Next, or in the meantime, I would also check that GPUs are correctly used by using system tools like nvidia-smi. If you see some usage here, you should be able to get it from Python.

You can also use the Dask dashboard which has gpu support if dask-cuda is installed.