I am having trouble with dask memory management, I am trying to interpolate a large raster using dask, rioxarray and rasterio.
I open my GTiff file using rioxarray and I chunk it to get a dask array :
import rioxarray import rasterio ds = rioxarray.open_rasterio(filename="D:\\Documents\\MISSIONS\\DASK\\interpolation\\echant_much_more_bigger_30cm_pix_l93.tif", chunks=(1, 5000, 5000)).astype(rasterio.int8) ds
Then, I recover all my chunks thanks to this function :
chunks = ds.data.to_delayed().ravel()
To interpolate, I use the rasterio.fill.fillnodata function so I have to create a mask array :
import numpy as np def compute_chunk(chunk): return chunk.compute() def create_mask(chunk): mask = np.where(chunk < 9, 0, chunk) return mask
masks =  for chunk in chunks: computed = client.submit(compute_chunk, chunk) mask = client.submit(create_mask, computed) del computed masks.append(mask) del mask
When my masks array is created, my processing memory is around 8 GiB with about 2 GiB of unmanaged memory (1 old and 1 recent). So when I try to get the results of my mask (that is an array of futures) like this :
res = [future.result() for future in masks]
I get a MemoryError.
I did some researches and I tried to change my memory manging method but I think I’m doing something wrong… but I don’t know what. Did someone have an idea or an advice ?
Thanks you in advance !