Back-shifting non-uniform-sized edge chunks to get constant-sized input to map_blocks

timblakely · July 28, 2022, 7:50pm

Heya!

Some of our ML pipelines require a specificly-/constant-sized input. Using Array.map_blocks works when chunksize is the expected input _and the Array is evenly divisible by chunksize. Unfortunately this means when it’s not we have non-chunksize-chunks at the back of the volume:

Our previous frameworks mitigated this by back-shifting the small end subvolumes by the desired input size. For the above example, the edge subvolumes (red) would be back-shifted by the chunksize resulting in subvolumes that were chunksize (blue) but not necessarily chunk-aligned:

Then only the red parts of the resulting processed output would be inserted into the final array.

Is there a way to do something similar natively in Dask? Right now I’m kludging together something that processes slices that contain the range of full chunks, then “manually” doing the process described above for each edge chunk. Unfortunately computing the back-shifted subvolumes in python becomes an enormous bottleneck on very large arrays or arrays with high rank. That said, my implementation is almost certainly sub-optimal as I’m somewhat new to Dask

Genevieve · August 4, 2022, 7:40am

I’m pretty sure there is not a simpler way to do this natively in Dask, sorry to say.

Maybe you could look at the dask.array.core.slices_from_chunks function? I once built a hacky workaround to do some special handling at the boundaries of a Dask array. (I’m not saying it was necessarily good, or that it will necessarily match your needs. But this is the workaround I came up with, so maybe it’s interesting to read about)

slices_from_chunks_overlap · GitHub
More context: Skeleton analysis

Topic		Replies	Views
Prevent dask array from `compute()` behavior Dask Array dask-array	9	901	March 19, 2022
Change array shape with map_block function Dask Array	1	145	November 16, 2023
Why do chunks get inverted? Dask Array dask-array	2	182	August 21, 2023
Parallelize or map chunks of arrays with different sizes, shapes and number of blocks Dask Array dask-array	4	645	July 31, 2023
Implementing chunk-wise gufuncs Dask Array	2	240	June 6, 2022

Back-shifting non-uniform-sized edge chunks to get constant-sized input to map_blocks

Related topics