Change array shape with map_block function

AlejandroTL · November 15, 2023, 3:15pm

Hello everyone!

I have a problem that I don’t know how to solve, so I come here looking for help

Basically, I have data in a zarr file and want to read it with Dask and process it. For that, I have written the following code:

def preprocess_count_matrix(x, normalization):
    if normalization == DISS_NORMALIZATION:
        return x.map_blocks(
            process_data_main, 
            max_len=2000, 
            dtype='f4',
        )
    elif normalization == 'raw':
        return x
    else:
        raise ValueError(f'NORMALIZATION not found')

XX = preprocess_matrix(da.from_zarr(join(DATA_PATH, split, f'zarr_{i}'), 'X'), NORMALIZATION)

The function process_data_main basically converts the data, no matter what, into an array whose second dimension is 2000. However, if my initial zarr file is size, let’s say (n x 500), XX is still size (n x 500) for Dask, instead of (n x 2000), even though I can call .compute() and verify that is indeed 2000.
This is a problem cause later I want to concatenate that XX to other arrays that must be, also, size (n x 2000) but can’t do it since Dask still thinks is the previous size. How can I fix this? The meta argument, as far as I know, cannot be used to set the shape of the output. I can always do .compute() or .persist() or some of those calls to get out of the lazy mode, but I want to avoid it as much as possible.

I’m quite lost, been trying for a while with no success.

Thanks!!

guillaumeeb · November 16, 2023, 1:45pm

Hi @AlejandroTL, welcome to Dask Discourse!

I think you are looking for the chunk kwarg of map_blocks, as said in the documentation:

Chunk shape of resulting blocks if the function does not preserve shape. If not provided, the resulting array is assumed to have the same block structure as the first input array.

Dask has no way to infer the output chunks shape itself, so you must precise it.

Topic		Replies	Views
Map_blocks unexpected behavior adds rows to dim when specifying chunks Dask Array	2	202	August 3, 2023
Da.map_blocks introduces unexpected chunks? Dask Array dask-array	3	63	July 5, 2024
Back-shifting non-uniform-sized edge chunks to get constant-sized input to map_blocks Dask Array dask-array	1	336	August 4, 2022
Why do chunks get inverted? Dask Array dask-array	2	181	August 21, 2023
Prevent dask array from `compute()` behavior Dask Array dask-array	9	896	March 19, 2022

Change array shape with map_block function

Related topics