Boolean array indexing rules do not follow those of numpy

assume you have for example a general Numpy array arr of shape (a,b,c,d,e,f) and a corresponding boolean array idx of shape (a,b,c,d).

Then in numpy arr[idx].shape == (N, e, f), where N is the number of True elements. However, when arr is a dask array, this will fail:

*** IndexError: Boolean array with size a*b*c*d is not long enough for axis 0 with size a

It works fine if only indexing the first axis with a boolean array of shape (a,). It works fine if indexing all axes, with a boolean array of shape == arr.shape (although you get nan shape & chunksize, see also here). However, it does not work for a partial mask.

Is this expected behaviour?

Hi @mueslo, welcome to Dask Discourse!

I think so yes, Dask tries to mimic Numpy but it does not implement all of its features. I can confirm that I also observe your results.

You might be able to find a workaround either by artificially building a complete boolean index or by trying to slice by chunks, although I’ve not much to propose you yet.