Memory Leak on Dask Worker

Hi, team,

Currently I use Dask as the distributed infrastructure to train AI model, and the working pattern is that the client sends the request to ask the workers to compute some metrics on the numpy.ndarray again and again, and the code for each round is same. All my programs is written by Python.

However, the memory warning
“distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS”
occurs after doing some rounds of compute.

I have called gc.collect on all worker by “client.run(gc.collect)” in the end of each round but it doesn’t resolve this issue.

My question, considering the memory continues to increase with round of compute going on, does this warning means that there are some code running on Dask worker, which keep the object referenced after each round of compute? In my mind, Python does GC so I don’t need to do “del” myself, and I don’t use any Dask.dataarray or Dask.dataframe.

Any comments is appreciated.
Chris Ding

@cuauty Thanks for the question! Would you be able to share a minimal version of your workflow? It’ll allow us to reproduce this issue to see what’s going on. :slight_smile:

In general though, maybe something here can help: Worker Memory Management — Dask.distributed 2022.8.1 documentation

Hi @cuauty, the version of dask you’re using is quite old. There have been many improvements in memory management recently; could you try the latest release?

The documentation on the topic (linked above) should help you debug the issue.

1 Like

@pavithraes @crusaderky

Thank you for your reply. I will try the latest version and then reply what happen.

Chris

1 Like

@cuauty - I am facing a somewhat similar issue (not with np.ndarray though). Have you managed to resolve this?