I have a question about the reasoning behind the implementation of chunks.
When I call map_blocks on two arrays, naturally the chunks need to match.
I noticed by trail and error that the chunks need to align in reverse, if the dimensions don’t align.
To illustrate this, see the following example and note that arr2 and arr3 have their chunks swapped:
import numpy as np
import dask.array as da
def add(x,y):
return x + y
arr1 = da.arange(10, chunks=((1,3,3,3),))
arr2_np = np.arange(100).reshape((10, 10))
arr2 = da.from_array(arr2_np, chunks = ((1,3,3,3), (5,5)) )
arr3 = da.from_array(arr2_np, chunks = ((5,5), (1,3,3,3)) )
result = da.map_blocks(add, arr1, arr2).compute()
result_inverse = da.map_blocks(add, arr1, arr3).compute()
The first map_blocks call (with arr2) fails with the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/timo/Documents/open_source/tmillenaar/dask/dask/base.py", line 310, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/home/timo/Documents/open_source/tmillenaar/dask/dask/base.py", line 589, in compute
dsk = collections_to_dsk(collections, optimize_graph, **kwargs)
File "/home/timo/Documents/open_source/tmillenaar/dask/dask/base.py", line 362, in collections_to_dsk
dsk = opt(dsk, keys, **kwargs)
File "/home/timo/Documents/open_source/tmillenaar/dask/dask/array/optimization.py", line 50, in optimize
dsk = fuse_roots(dsk, keys=keys)
File "/home/timo/Documents/open_source/tmillenaar/dask/dask/blockwise.py", line 1535, in fuse_roots
new = toolz.merge(layer, *[layers[dep] for dep in deps])
File "/home/timo/Documents/open_source/tmillenaar/dask/venv/lib/python3.10/site-packages/toolz/dicttoolz.py", line 39, in merge
rv.update(d)
File "/usr/lib/python3.10/_collections_abc.py", line 886, in __iter__
yield from self._mapping
File "/home/timo/Documents/open_source/tmillenaar/dask/dask/blockwise.py", line 494, in __iter__
return iter(self._dict)
File "/home/timo/Documents/open_source/tmillenaar/dask/dask/blockwise.py", line 470, in _dict
dims=self.dims,
File "/home/timo/Documents/open_source/tmillenaar/dask/dask/blockwise.py", line 446, in dims
self._dims = _make_dims(self.indices, self.numblocks, self.new_axes)
File "/home/timo/Documents/open_source/tmillenaar/dask/dask/blockwise.py", line 1484, in _make_dims
dims = broadcast_dimensions(indices, numblocks)
File "/home/timo/Documents/open_source/tmillenaar/dask/dask/blockwise.py", line 1475, in broadcast_dimensions
raise ValueError("Shapes do not align %s" % g)
ValueError: Shapes do not align {'.0': {2, 4}, '.1': {4}}
The second attempt (with arr3) works.
Intuitively, I expected the first (and only) axis of arr1 to align with the first axis arr2, but it seems instead it aligns with the last axis of arr2.
This is because the order is deliberately flipped here:
Now I wonder, why is this being flipped?
Cheers,
Timo
P.S.
I was attempting to debug an issue I filed a while back.
It was not given any attention so I decided to give it a try.
I realized though that I might just not understand the reason for the design decisions made in the past.
Okay, I’m not an array expert, but I believe this has to do with Numpy broadcasting rules. I just tried the following, building fake chunks with similar shape as those from arr2 and arr3 in your example:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[29], line 1
----> 1 arr1.blocks[1].compute() + arr2_chunk
ValueError: operands could not be broadcast together with shapes (3,) (3,5)
I just had a look at numpy’s broadcasting rules because of your suggestion. Intuitively I was trying to match up the first axis of arr1 with the first axis of arr2. I think I made the following mistake:
It nevertheless has some funny consequences when using map_blocks but at least I understand now where these originate.