Memory Leak on Dask Worker

cuauty · June 5, 2022, 2:13am

Hi, team,

Currently I use Dask as the distributed infrastructure to train AI model, and the working pattern is that the client sends the request to ask the workers to compute some metrics on the numpy.ndarray again and again, and the code for each round is same. All my programs is written by Python.

However, the memory warning
“distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS”
occurs after doing some rounds of compute.

I have called gc.collect on all worker by “client.run(gc.collect)” in the end of each round but it doesn’t resolve this issue.

My question, considering the memory continues to increase with round of compute going on, does this warning means that there are some code running on Dask worker, which keep the object referenced after each round of compute? In my mind, Python does GC so I don’t need to do “del” myself, and I don’t use any Dask.dataarray or Dask.dataframe.

Any comments is appreciated.
Chris Ding

pavithraes · June 10, 2022, 2:43pm

@cuauty Thanks for the question! Would you be able to share a minimal version of your workflow? It’ll allow us to reproduce this issue to see what’s going on.

In general though, maybe something here can help: Worker Memory Management — Dask.distributed 2022.8.1 documentation

crusaderky · June 12, 2022, 5:40pm

Hi @cuauty, the version of dask you’re using is quite old. There have been many improvements in memory management recently; could you try the latest release?

The documentation on the topic (linked above) should help you debug the issue.

cuauty · June 13, 2022, 10:31am

@pavithraes @crusaderky

Thank you for your reply. I will try the latest version and then reply what happen.

Chris

tomercagan · July 20, 2022, 12:56pm

@cuauty - I am facing a somewhat similar issue (not with np.ndarray though). Have you managed to resolve this?

Topic		Replies	Views
Memory leak in dask cluster Distributed kubernetes , distributed	5	1881	April 13, 2023
Manage garbage collection of Workers Distributed delayed , worker	10	363	July 13, 2024
Why I get a lot of unmanaged memory? Distributed	27	3916	February 28, 2023
Unmanaged memory high even after future collection Distributed	2	217	December 5, 2023
Why my managed memory is zero or KB? Distributed kubernetes , distributed	5	111	April 5, 2024

Memory Leak on Dask Worker

Related topics