Thanks for your help. I suspect that my use case is a little different from the one that `map_overlap`

was designed for. As I understand, `map_overlap`

is mainly useful if you are mapping a function which takes a local neighbourhood, and returns a single value (for example, taking a derivative). What I want to do is map a function which takes a local neighbourhood, *and returns a local neighbourhood*. Those returned local neighbourhoods would then overlap in the final array.

What exactly ‘overlapping’ means is tricky - consider the various blending modes in image software. My need is for overlapping values to simply be summed (essentially an additive/linear-dodge blending mode).

I asked this question because I felt wanting to return local neighbourhoods, and summing overlapping neighbourhoods, was somehow a natural idea, and was curious if there was a way of doing this with `map_overlap`

that I was missing.

I’ve put together the following MWE for a 2D array, which hopefully clarifies what I am trying to achieve:

```
import numpy as np
import dask.array as da
arr = da.random.random((100, 100), chunks=(10, 10))
depth = 2
result = da.map_overlap(lambda x: x**2, arr, depth=depth, boundary=0, trim=False)
result_untrimmed = result.compute()
# Re-shape/re-slice the result array into the original shape, using an additive blending mode.
# Original chunks
chunk_sizes = result.chunks
chunk_indices = [np.cumsum(np.concatenate([[0], chunk_sizes[:-1]])) for chunk_sizes in chunk_sizes]
# Chunks with overlap (call them "blocks")
block_sizes = tuple(np.array(chunk_sizes) + 2 * depth for chunk_sizes in chunk_sizes)
block_indices = [np.cumsum(np.concatenate([[0], block_sizes[:-1]])) for block_sizes in block_sizes]
# Prepare final array
result_final = np.zeros(np.array(arr.shape) + 2 * depth)
# Iterate over the chunks/blocks, and add them to the final result array
for i in range(len(chunk_sizes[0])):
for j in range(len(chunk_sizes[1])):
# Get the size of the current block
block_size = block_sizes[0][i], block_sizes[1][j]
# Get the block
block = result_untrimmed[block_indices[0][i]:block_indices[0][i] + block_size[0],
block_indices[1][j]:block_indices[1][j] + block_size[1]]
# Get the indices of the current chunk in the final array
i_diff, j_diff = chunk_indices[0][i], chunk_indices[1][j]
# Add the current block to the final array
result_final[i_diff:i_diff + block_size[0], j_diff:j_diff + block_size[1]] += block
# If desired, trim the outer excess now.
result_final = result_final[depth:-depth, depth:-depth]
```

Of course, if for some reason the chunk sizes are relatively small and depth is large, such an approach can take up a lot of memory!