Constructing a sparse dask array from Numpy arrays

astrofrog · September 7, 2023, 11:22am

Hi everyone,

I am trying to do the following:

import numpy as np
from dask import array as da
array = da.zeros((2000, 2000, 2000))
image = np.random.random((500, 500))
array[10, :500, :500] = image
array[20, 500:1000, :500] = image
array[30, 500:1000, 500:1000] = image
array[40, :500, 500:1000] = image
result = da.median(array, axis=0)

However this appears to create quite a complex graph (which I can’t manage to visualize) and when I try and compute this it uses a lot of memory (it seems to perhaps load the whole array into memory).

In practice my use case is that I am creating a large mosaic from many images. Each image only covers a small part of the final image, and I want to combine the images that do overlap using a median function.

Is there a better way of achieving what I need?

Thanks!
Tom

guillaumeeb · September 7, 2023, 8:05pm

It does not, not at the same time, but every chunk of the initial array value will be created at some point into memory.

There was the below discussion in Pangeo community in the case of earth science. Does it sounds like your problem?

But maybe your problem is simpler, did you try the code mentioned here:
https://docs.dask.org/en/latest/array-sparse.html

Topic		Replies	Views
Create an numpy array from dask dataframe Dask DataFrame	1	1645	August 31, 2022
Confused about working with sparse arrays Dask Array dask-array , sparse	1	745	April 12, 2023
Parallelize or map chunks of arrays with different sizes, shapes and number of blocks Dask Array dask-array	4	625	July 31, 2023
Using dask to rescale a large numpy array	1	403	June 2, 2023
Dataframe from sparse array Dask DataFrame	0	455	August 18, 2022

Constructing a sparse dask array from Numpy arrays

Related topics