I have a dask graph like {‘A’ : dataframe, ‘B’ : (func, A), ‘C’ : (func, B) …}
The graph has so many tasks so I run it fast by dask on ray.
The ray just used for memory and store object. Dask run the schedular.
Here is a question. I read scheduling in Depth in dask docs and I know how it works when static.
But when it`s running by distribute(or on ray) with many cpus , how can I predict the memory use.
(we can know object A ,B… = 1000Mb. )
Why I need predict memory? I need to start ray to store object. If I start ray in a small memory , it failed.
How can we predict total memory if we have a dask costom graph and know every task`s memory?