I’m using Dask Array to work with an image and some other data arrays with 2 or 3 dimensions, and with blocks not aligned to those of the image.

For every pixel/cell of the image array, I have to compute a function with every value on the data arrays.

In terms of Dask arrays, I need to find a way to divide the load per chunks of the data arrays but also per chunk of image, **ideally without falling back to the use of a for-loop** to iterate in every chunk of the image.

More particularly, I have to use the image indices

```
import dask.array as da
import numpy as np
n, m = (4096, 2048)
x = np.linspace(0, 1, n)
y = np.linspace(0, 1, m)
xv, yv = da.meshgrid(x, y)
```

The data arrays look like this:

```
n0, n1, n2 = (56000, 8, 4)
chunks = {0:10000}
# 2d arrays
a = da.random.random((n0, n1), chunks=chunks).astype(np.float32)
b = da.random.random((n0, n1), chunks=chunks).astype(np.float32)
# 3d arrays
c = da.random.random((n0, n1, n2), chunks=chunks).astype(np.float32)
# complex array
dr = da.random.random((n0, n1, n2), chunks=chunks).astype(np.float32)
di = 1.0j * da.random.random((n0, n1, n2), chunks=chunks).astype(np.float32)
d = dr + di
```

Without using map_blocks/blockwise, the fully broadcasted version would look like the following example.

```
axv = a[:, :, None, None] * xv[None, None, :, :]
byv = b[:, :, None, None] * yv[None, None, :, :]
u = axv + byv
v = c * d
vr = v.real[:, :, :, None, None] * da.cos(2 * np.pi * u)[:, :, None, :, :]
vi = v.imag[:, :, :, None, None] * da.sin(2 * np.pi * u)[:, :, None, :, :]
w = vr - vi
# Reduce data to image dim
res = da.sum(w, axis=[0, 1, 2])
```

I’m not trying to do that, the process of broadcasting results in too large arrays.

## What I have tried until now

I’ve been trying the program with images of size 256.

```
n, m = (256, 256)
```

With this size, the following strategy has been working without problems.

1.- Broadcast and multiply the data array with the index arrays, so the firsts dimensions come from the data array.

2.- Use blockwise to map the calculation on every block.

```
from numba import float32, guvectorize
def block_fun(u, c, d):
vr = c[0] * d[0].real
vi = c[0] * d[0].imag
res = numba_fun(u, vr, vi)
return res
@guvectorize(
[(float32[:, :], float32[:], float32[:], float32[:, :])],
'(n,m),(i),(i)->(n,m)',
nopython=True,
)
def numba_fun(u, vr, vi, res):
u_pi = 2 * np.pi * u
cos_u = np.cos(u_pi)
sin_u = np.sin(u_pi)
for i in range(vr.shape[0]):
res += vr[i] * cos_u - vi[i] * sin_u
n, m = (256, 256)
x = np.linspace(0, 1, n)
y = np.linspace(0, 1, m)
xv, yv = da.meshgrid(x, y)
axv = a[:, :, None, None] * xv[None, None, :, :]
byv = b[:, :, None, None] * yv[None, None, :, :]
u = axv + byv
data = da.blockwise(
block_fun,
'ijnm',
u,
'ijnm',
c,
'ijk',
d,
'ijk',
dtype=np.float32
)
res = da.sum(data, axis=[0, 1])
```

This works because the size of each block of the broadcasted array is small enough to fit in memory. Changing the size of the image with this version does not work due to memory limits.

## What do I need

The expected result is get the modified image, without memory problems.

One option is to modify what has been working with smaller images, but as I mentioned before, the idea is to avoid a for-loop over the image/image-chunks, and I don’t know how to do it without the loop.

Any suggestion or modification on the existing code is greatly appreciated.