Estimating disk space needed for temporary files

I tried to play with your code to understand what was happening, but it’s a bit too complicated for me. Can you reproduce this kind of behavior with some part of this code?

First, there is the same misunderstanding as in TypeError on da.argmax when executing compute.

Dask tries to stream computation as much as it can, which means avoiding storing intermediate results if it doesn"t need to. In this case, it seems that somewhere within the computation, it needs to store some intermediate chunks results to be able to perform some aggregation at some point. So it spills data to disk until it can perform this aggregation. I didn’t find where while looking at your code for a few minutes, the result is not big, so I’m not sure.

Well, that depends of the kind of computation your doing (threading vs multi-processing), the IO (multiprocessing will be better at some point), and if you want to scale to several server (then you’ll need to use distributed Dask and thus a Client).

Unfortunately not.