Back-shifting non-uniform-sized edge chunks to get constant-sized input to map_blocks

Heya!

Some of our ML pipelines require a specificly-/constant-sized input. Using Array.map_blocks works when chunksize is the expected input _and the Array is evenly divisible by chunksize. Unfortunately this means when it’s not we have non-chunksize-chunks at the back of the volume:

Our previous frameworks mitigated this by back-shifting the small end subvolumes by the desired input size. For the above example, the edge subvolumes (red) would be back-shifted by the chunksize resulting in subvolumes that were chunksize (blue) but not necessarily chunk-aligned:

Then only the red parts of the resulting processed output would be inserted into the final array.

Is there a way to do something similar natively in Dask? Right now I’m kludging together something that processes slices that contain the range of full chunks, then “manually” doing the process described above for each edge chunk. Unfortunately computing the back-shifted subvolumes in python becomes an enormous bottleneck on very large arrays or arrays with high rank. That said, my implementation is almost certainly sub-optimal as I’m somewhat new to Dask :slight_smile:

I’m pretty sure there is not a simpler way to do this natively in Dask, sorry to say.

Maybe you could look at the dask.array.core.slices_from_chunks function? I once built a hacky workaround to do some special handling at the boundaries of a Dask array. (I’m not saying it was necessarily good, or that it will necessarily match your needs. But this is the workaround I came up with, so maybe it’s interesting to read about)

1 Like