Heya!
Some of our ML pipelines require a specificly-/constant-sized input. Using Array.map_blocks
works when chunksize
is the expected input _and the Array is evenly divisible by chunksize
. Unfortunately this means when it’s not we have non-chunksize-chunks at the back of the volume:
Our previous frameworks mitigated this by back-shifting the small end subvolumes by the desired input size. For the above example, the edge subvolumes (red) would be back-shifted by the chunksize resulting in subvolumes that were chunksize (blue) but not necessarily chunk-aligned:
Then only the red parts of the resulting processed output would be inserted into the final array.
Is there a way to do something similar natively in Dask? Right now I’m kludging together something that processes slices that contain the range of full chunks, then “manually” doing the process described above for each edge chunk. Unfortunately computing the back-shifted subvolumes in python becomes an enormous bottleneck on very large arrays or arrays with high rank. That said, my implementation is almost certainly sub-optimal as I’m somewhat new to Dask