Hello all,
I am having trouble utilizing da.map_blocks
to process chunks of a numpy
array in parallel.
Consider this setup as snippets of the relevant code passages:
# Loading image, getting all relevant preprocessing steps done, and converting to a numpy array...
.
.
.
print(f"Image:")
print(f" type: {type(image)}")
print(f" length {len(image)}")
image_size = image.shape
print(f" shape: {image_size}")
image_chunks = (image_size[0] / chunk_along_axis(image_size[0]),
image_size[1] / chunk_along_axis(image_size[1]),
image_size[2] / chunk_along_axis(image_size[2]),
image_size[3] / chunk_along_axis(image_size[3]))
print(f" chunks: {image_chunks}")
image_dask = da.from_array(image, chunks=image_chunks)
print(f"Number of chunks: {image_dask.chunks}")
for s, chunk_series in enumerate(image_dask.chunks[0]):
for i, chunk_depth in enumerate(image_dask.chunks[1]):
for j, chunk_height in enumerate(image_dask.chunks[2]):
for k, chunk_width in enumerate(image_dask.chunks[3]):
print(f"S{s}:{chunk_series} | D{i}:{chunk_depth} | H{j}:{chunk_height} | W{k}: {chunk_width}")
processed_image = da.map_blocks(compute_with_DASK, image_dask, arg1, arg2)
chunk_along_axis
returns an integer value that is used to divide an axis into even parts.
compute_with_DASK
is defined as:
def compute_with_DASK(image_chunk, arg1, arg2):
chunk_size = image_chunk.shape
print(f"Processing with dask of chunk with size: {chunk_size}")
processed_chunk = computing_function(image_chunk, arg1, arg2)
return processed_chunk
When I run this code, the image-related print()
statements will give me:
Image:
type: <class 'numpy.ndarray'>
length: 1
shape: (1, 1092, 520, 520)
Which should be divided into chunks of the size:
chunks: (1.0, 273.0, 260.0, 260.0)
To my understanding, this should result in 16 chunks. This is also confirmed by printing them individually with the four nested for-loops:
S0:1 | D0:273 | H0:260 | W0: 260 # 1
S0:1 | D0:273 | H0:260 | W1: 260 # 2
S0:1 | D0:273 | H1:260 | W0: 260 # 3
S0:1 | D0:273 | H1:260 | W1: 260 # 4
S0:1 | D1:273 | H0:260 | W0: 260 # 5
S0:1 | D1:273 | H0:260 | W1: 260 # 6
S0:1 | D1:273 | H1:260 | W0: 260 # 7
S0:1 | D1:273 | H1:260 | W1: 260 # 8
S0:1 | D2:273 | H0:260 | W0: 260 # 9
S0:1 | D2:273 | H0:260 | W1: 260 # 10
S0:1 | D2:273 | H1:260 | W0: 260 # 11
S0:1 | D2:273 | H1:260 | W1: 260 # 12
S0:1 | D3:273 | H0:260 | W0: 260 # 13
S0:1 | D3:273 | H0:260 | W1: 260 # 14
S0:1 | D3:273 | H1:260 | W0: 260 # 15
S0:1 | D3:273 | H1:260 | W1: 260 # 16
Now, when I reach the da.map_blocks
part, the print()
statements of the compute_with_DASK
function show:
Processing with dask of chunk with size: (1, 1, 1, 1)
or
Processing with dask of chunk with size: (0, 0, 0, 0)
When I want to use the data of processed_image
, I get the following error:
# Traceback
.
.
.
result.append(tuple(shape(deepfirst(a))[dim] for a in arrays))
IndexError: tuple index out of range
Where do these extra chunks of unexpected sizes come from? And do they cause the tuple index to run out of range because of the artificially added chunks?
compute_with_DASK
does not alter the size of the image, so I assume the problem should not come from here.
I would be very thankful if you guys have any idea of what I am doing wrong. I really appreciate any help you can provide.