Hi dask team,
I have question that I imaging there is an answer to, although I can’t seem to find it anywhere. I’m running a dask cluster in GKE with nodes that have nvidia GPUs attached to them. I am successfully using the gpus in a distributed way by pushing pytorch models and datasets to the cuda devices, but I am struggling to monitor GPU memory usage.
I see charts in the diagnostic dashboard labelled gpu utilization and gpu memory, but they are completely blank for me, even when running dask-worker containers on them.
My question is, where do those/does dask look for gpu utilization metrics? My guess is that this is an issue with dask inside the container not having sufficient access to the GPU in order to report on it, but without knowing where it is looking I’m not sure where to start debugging.
Thank you in advance for your help.