Predict the memory when use dask distributed schedular

Asuka · January 20, 2022, 2:21am

I have a dask graph like {‘A’ : dataframe, ‘B’ : (func, A), ‘C’ : (func, B) …}
The graph has so many tasks so I run it fast by dask on ray.
The ray just used for memory and store object. Dask run the schedular.
Here is a question. I read scheduling in Depth in dask docs and I know how it works when static.
But when it`s running by distribute(or on ray) with many cpus , how can I predict the memory use.
(we can know object A ,B… = 1000Mb. )
Why I need predict memory? I need to start ray to store object. If I start ray in a small memory , it failed.

How can we predict total memory if we have a dask costom graph and know every task`s memory?

pavithraes · January 20, 2022, 6:38pm

@Asuka Thanks for the question! I don’t think there’s a way to predict memory from the task graph. Without knowing much about your workflow or system, I’d say you can try increasing the number of partitions in your DataFrame and see if it’s more friendly to your memory capacity?

Again, I think this question may be better suited for the Ray community.

As an aside, we usually recommend using the Worker Memory plots that come with Dask Distributed’s Dashboard to diagnose how memory is being managed by the scheduler, so if you switch to the distributed scheduler, you can use that.

Topic		Replies	Views
Custom Graphs memory limit Distributed	11	355	March 8, 2022
Placing limits on scheduler memory Distributed dask-jobqueue , distributed	8	558	February 28, 2023
API to access diagnose dashboard data	2	185	May 11, 2022
Specify that a given task use a huge amount of RAM to the Dask Ressource Manager Distributed distributed	5	386	October 24, 2022
Unmanaged Memory of Scheduler Causes Failure Distributed scheduler	18	836	November 16, 2022

Predict the memory when use dask distributed schedular

Related topics