Dashboard latency

secrettoad · June 16, 2023, 4:06pm

Hello all,

Quick question that I hope someone here won’t mind answering… I am consistently running into latency issues with the scheduler dashboard that I am almost certain stem from a CPU bottleneck on the scheduler. However, the scheduler has access to 4 cores and only one seems to be utilized at ~100%. By latency I mean that the dashboard only updates every 5-10 seconds and navigating to another page takes 10-20 seconds to load.

My question is really two:

Is there anyway to enable the scheduler to run certain things asynchronously, therefore taking advantage of the extra CPU cores available to it and potentially reducing the latency of the updates?

Is this expected behavior and, if so, are there any best practices for keeping the scheduler in a state where it can update the dashboard relatively constantly?

Thank you so much

secrettoad · June 16, 2023, 4:38pm

FYI - Solved this by reducing the number of tasks on the scheduler at any given time from 2 million to 20k.

Would still very much be interested in understanding if possible to get the scheduler to use multiple CPU cores. Or, if this is on the roadmap/something I could maybe try to help with.

guillaumeeb · June 16, 2023, 6:51pm

Hi @secrettoad , welcome to Dask community!

As far as I know, the Dask Scheduler is single threaded, so it cannot take advantage of extra cores. I tried to look for this information into the documentation but was unable to find it though.

Scheduler bottleneck really is the number of tasks. In my experience, it is crucial to try to keep this number below one million and even 100 000 for the time being.

I think this has been discussed, but this work has never been done.

crusaderky · June 19, 2023, 8:43am

Hello,
The Dask scheduler is strictly single threaded. This is by design, to avoid the race conditions that are intrinsic in multithreading Historically, we had to deal with enough race conditions already that we really don’t want to add multithreading in the mix.

We are aware of performance degradation with 1mil+ tasks on the cluster. It’s in the TODO list.

In the meantime, I strongly recommend you don’t generate as many tasks. You can achieve this by (1) increasing chunk/partition size and (2) setting optimization.fuse settings to be more aggressive; e.g.

optimization:
  fuse:
    active: true
    ave-width: 16

Topic		Replies	Views
One output time vs multiple output time Deploying Dask delayed	1	269	April 19, 2022
Distributed Scheduler becoming the bottleneck Distributed	5	164	September 17, 2024
Dask distributed performance issues Distributed kubernetes , future , distributed	1	247	December 7, 2022
Limit number of queued tasks per worker Distributed delayed , distributed	3	157	October 4, 2024
Seemingly more overhead than expected for scheduling tasks Distributed distributed	2	214	January 31, 2023

Dashboard latency

Related topics