Why my memory blows up even before the task starts to run?

noah822 · August 14, 2023, 1:29am

Sry for framing this problem pretty vaguely. I have not found a way to reproduce this in concise code. My problem is: I am using dask array with numba optimzed code. Current implementation works well on relatively small scale data. However, when I increase the problem scale, my code can not even run.

When looking at my dask dashboard, I found that my memory blows up even before any task starts to run. I work intensively with numpy on a single machine with 500GB RAM. So I set Client(processes=False) when configuring my cluster. I think there won’t be any burden of copying around data between processes, because I am only launching one here.

What are the possible reason for this behavior? I don’t think it is because of my chunk size, which I have tried to set that large enough. Thanks in advance Any suggestion appreciated!

guillaumeeb · August 17, 2023, 9:20pm

It’s really hard to tell without at least a little code snippet, just how you read the data, which API you are using, the first few lines of Dask code…

It might be you are reading data client side and sending it to workers through the graph, or maybe workers are reading too much data but you should see that through the dashboard…

Topic		Replies	Views
Memory fluctuating but no tasks being processed Distributed dask-array	1	247	January 20, 2023
Diagnosing whether problem in code or dask setting that causes error Deploying Dask	8	821	August 2, 2023
Unexpected Dask cluster behavior on docker setup Deploying Dask docker	9	726	February 23, 2022
Memory accumulation using client.map - how can I avoid this? Distributed client	7	1783	March 22, 2022
Using dask to rescale a large numpy array	1	432	June 2, 2023

Why my memory blows up even before the task starts to run?

Related topics