Scheduler memory management

Hello everyone,
I have a question about how the dask scheduler manages memory internally, in a previous thread I got the ECSCluster up and running with the adaptive functionality turned on ECSCluster adaptivity - Distributed - Dask Forum. I now see that the memory that the scheduler is managing does not decrease after a while, it instead grows with time until the instance where my scheduler is running does not have any more memory and then the scheduler “freezes” and then fails.

I do know I have an issue with the size of the dask graph that are sent to the scheduler (some graphs are up to 800mb in size and then deserialized it is even bigger). Is there a way to require the scheduler to use a certain amount of memory? Further more, I do remove any local references to dask futures and I also make use of the .cancel() function to help with this as well.

Thanks for any advice and help!

Hi @cbritogonzalez,

AFAIK, no, there isn’t. There is a discussion about it here: Placing limits on scheduler memory with some tips.

And do you know why you have such a big graph?

This should allow Scheduler Python process to free some memory, are you sure you don’t keep any reference to some data or Future? Without some code snippet or MVCE, it is hard to help more here.

The amount of memory Scheduler use is directly linked to task graphs or Futures kept in Cluster memory.

Hey! sorry for getting back to this so late. I actually fixed most of the issue I had here by tweaking some options for xarray for loading my data and then I updated some of the graphs I was making in between so that the cluster held most of the data. By this I used more .persist() instead of using .compute().result() . This way the scheduler did not need to load the data it needed to return in the scheduler memory and it just remained on the cluster memory.
Thanks for the response and then help @guillaumeeb!

1 Like