Very low CPU utilization on SLURMCluster

ikabadzhov · November 7, 2021, 3:15pm

Hello, I was trying to set a dask + slurm cluster and do some benchmarks.I am attaching the report of my call - Dask Performance Report. Here args.Nodes=8, args.cores=32, i.e. running on 32*8 cores total over a cluster. I explicitly want to say that I would like to have 1 process per core and each process to have 1 thread. And also to assign 1 task per node.

When going through the System tab of the report, I see that there is very low CPU utilization (less than 10% on avarage). I entered several nodes, where computations were done and htop was showing me that all CPU-s are full. Is the CPU report reliable? Or if it is reliable, what can I do to increase the CPU utilization?

guillaumeeb · November 8, 2021, 9:46pm

Hi @ikabadzhov,

Some comments I can make from your report:

On your report, you only have 160cores. So I guess only 5 of the 8 jobs generated by you scale call made it to running state before your computation.

And you got that part perfectly right. This can be seen in the code snippet where process kwarg is equal to cores, but also on the summary of the report where 160 workers (eg. different processes) are identified.

This part I don’t know. I’m not sure of what the system tab reflects. But I can tell by looking at your task stream that you’ve definitely 160 computations active at the same time.

ikabadzhov · November 8, 2021, 10:04pm

Thanks @guillaumeeb for the response. I was trying to investigate the issue further.

Following the tutorial from here - Dask on HPC Introduction - YouTube [18:40] I ssh-ed into my cluster and was checking “live” how my workers are processing the jobs. Almost every worker was having very high CPU usage in majority of the time.

However, in the end, the performance report that I generated was similar to the one linked here (I fixed the number of workers), but again in the report the CPU utlization was very low.
When I refer to “tabs”, if you open the link , on the top there should be: <Summary, Task Stream, System, …>. The System tab tells me very low CPU utlization. And this is the CPU utilization that I have suspicion of not being reliable enough.

Right now, I have the “confirmation” from htop that a node is fully busy during computation, and also from the ssh-ed live report. I hope I formulated my suspicion clearer now.

guillaumeeb · November 8, 2021, 10:08pm

I’m not sure if the system tab represent overall CPU utilization across the Dask cluster. If this is the case, then it obviously looks wrong compared to what you see. But I think it might represent CPU usage on scheduler side (I should try for myself, but this is not possible for me at the moment).

Topic		Replies	Views
Dask worker using 600% of the CPU Distributed distributed	3	218	February 8, 2024
Parallelisation by multiprocessing not multithreading on SLURMCluster Distributed	1	318	April 23, 2022
SLURMCluster on 64 nodes / Understanding Cluster scale method Distributed distributed	4	302	November 17, 2021
Distributed Scheduler becoming the bottleneck Distributed	5	149	September 17, 2024
dask_jobqueue.SLURMCluster: multi-threaded workloads and the effect of setting "cores" Distributed dask-jobqueue , distributed	2	249	November 9, 2023

Very low CPU utilization on SLURMCluster

Related topics