What exactly is the bytes stored in the dashboard, and debugging the perf of a simple filter job

yifanwu · June 1, 2023, 1:43pm

I have a simple filter job (based on a value being greater than 0.5) running on about 11 Gibs of data, and no repartioning/shuffling. I’m seeing bytes stored up to 100 GB sometimes during the run, and the job get stuck with the following error message once it has already finished the jobs that writes to disk

Event loop was unresponsive in Nanny for 3.09s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability

The machine has 1 TB of memory.

Also, the dask dashboard often hangs, where the UI is not loading. Wondering if folks might have pointers!

Any ideas? Thanks!

guillaumeeb · June 2, 2023, 9:48am

Hi @yifanwu,

What kind of data do you have in input? Would it be possible it is 11GiB compressed?

Is the job really stuck? This message can occur depending of your workflow, but it shouldn’t block the job. Anyway, this should not be related to a lack of memory, but more to the computation you perform on the data.

I’ve never seen the Dashboard hang, or could this mean the machine you are using is overloaded?

Would you be able to share any code snippet or even better some reproducer?

Topic		Replies	Views
Memory limits reached in simple ETL-like data transformations Dask DataFrame worker	14	2555	March 30, 2023
Diagnosing whether problem in code or dask setting that causes error Deploying Dask	8	810	August 2, 2023
Why my memory blows up even before the task starts to run? Dask Array dask-array	1	190	August 17, 2023
Tasks slowing down significantly after 10-12 batches Distributed delayed , future , distributed	8	326	January 31, 2024
Run Dask script from command line without warning messages Dask DataFrame	1	173	July 22, 2022

What exactly is the bytes stored in the dashboard, and debugging the perf of a simple filter job

Related topics