Constructing a sparse dask array from Numpy arrays

Hi everyone,

I am trying to do the following:

import numpy as np
from dask import array as da
array = da.zeros((2000, 2000, 2000))
image = np.random.random((500, 500))
array[10, :500, :500] = image
array[20, 500:1000, :500] = image
array[30, 500:1000, 500:1000] = image
array[40, :500, 500:1000] = image
result = da.median(array, axis=0)

However this appears to create quite a complex graph (which I can’t manage to visualize) and when I try and compute this it uses a lot of memory (it seems to perhaps load the whole array into memory).

In practice my use case is that I am creating a large mosaic from many images. Each image only covers a small part of the final image, and I want to combine the images that do overlap using a median function.

Is there a better way of achieving what I need?

Thanks!
Tom

It does not, not at the same time, but every chunk of the initial array value will be created at some point into memory.

There was the below discussion in Pangeo community in the case of earth science. Does it sounds like your problem?

But maybe your problem is simpler, did you try the code mentioned here:
https://docs.dask.org/en/latest/array-sparse.html