I have a 3 dimensional array that I send to map_blocks, do some manipulations on the 3rd dimension and alter its shape. Accordingly, I specify chunks in map_blocks.
The shape of the output is a multiple of the chunksize for the 1st dimension, whereas it should remain unchanged from the input. I have also tried adding the drop_axis=2 and new_axis=2, but it doesn’t get the job done.
import dask.array as da
import numpy as np
n_cols = 12
n_params = 11
n_rows = 104599
test_arr = da.random.uniform(low=0, high=1, size=(n_rows, n_cols, n_params), chunks=(100, n_cols, n_params)).astype('float32')
def custom_func_dummy(inp):
i_arrays = []
for i in range(inp.shape[0]):
j_arrays = []
for j in range(inp.shape[1]):
res = inp[i,j,:] * 5 - 4
res = res.sum()
repeat_res = np.repeat(res, 8)
j_arrays.append(repeat_res)
j_stack = np.stack(j_arrays, axis=0)
i_arrays.append(j_stack)
res = np.stack(i_arrays, axis=1)
return res
test_arr.shape
(104599, 12, 11)
testing = da.map_blocks(custom_func_dummy, test_arr, chunks=(100,12, 8), dtype='float32')
testing.shape
(104600,12, 8)
The custom_function adequately approximates what I’m doing in my actual code for the purpose of this issue.
How do I get the output shape to be (104599, 12, 8)
?
By the way, I have a hunch that if I run compute on this (if it weren’t for the memory issues that I would encounter in my code, where the 3rd dimension is >80K), then I would get the right shape for that dimension. However, the problem is that this is only the first step of the code. Later on, I’m stacking outputs from different manipulations based on these results before doing compute. The shapes on those objects don’t match up by 1 row, so I need to resolve this issue here.