Sry for framing this problem pretty vaguely. I have not found a way to reproduce this in concise code. My problem is: I am using dask array with numba optimzed code. Current implementation works well on relatively small scale data. However, when I increase the problem scale, my code can not even run.
When looking at my dask dashboard, I found that my memory blows up even before any task starts to run. I work intensively with numpy
on a single machine with 500GB RAM. So I set Client(processes=False)
when configuring my cluster. I think there won’t be any burden of copying around data between processes, because I am only launching one here.
What are the possible reason for this behavior? I don’t think it is because of my chunk size, which I have tried to set that large enough. Thanks in advance Any suggestion appreciated!