Implementing chunk-wise gufuncs

jasonkena · June 2, 2022, 2:28pm

How do I go about creating a ufunc that takes chunks as an input and returns chunk-wise statistics as an output?

import numpy as np
import dask.array as da
import dask a = da.random.normal(size=(10,20,30), chunks=(5, 10, 30)) @da.as_gufunc("(i,j,k)->(i,j,k),()", output_dtypes=(a.dtype, float), allow_rechunk=False)
def stats(x):
print(x.shape)
return x+1, np.sum(x) b = stats(a)
c = b[0].compute()
d = b[1].compute()
print(c)
print(d)

For instance, here I want to c==a+1, but I want d to be of shape (2, 2, 1)

My particular use case also prevents me from rechunking “a” into a single chunk

pavithraes · June 6, 2022, 9:15pm

@jasonkena Welcome!

Does it need to be a ufunc? If yes, I’ll keep looking into this!

You can also use Dask Array’s map_blocks to keep track of chunk information:

import dask.array as da

my_arr = da.random.normal(size=(10,20,30), chunks=(5, 10, 30))

def func(block, block_info=None):
    print(f"chunk location = {block_info[0]['chunk-location']}")
    print(f"chunk shape = {block_info[None]['chunk-shape']}\n")
    return block

x = my_arr.map_blocks(func, dtype='float64').compute()
# chunk location = (0, 0, 0)
# chunk shape = (5, 10, 30)

# chunk location = (0, 1, 0)
# chunk shape = (5, 10, 30)

# chunk location = (1, 0, 0)
# chunk shape = (5, 10, 30)

# chunk location = (1, 1, 0)
# chunk shape = (5, 10, 30)

Would this help?

jasonkena · June 6, 2022, 10:39pm

Thank you @pavithraes! I ended up using numpy object arrays to handle ragged outputs like so:

def ragged_func(x, block_info=None):
    print(block_info)
    a = np.empty(1,dtype=object)
    a[0] = np.arange(np.random.randint(1, 7))
    return a.reshape(1,1)

Topic		Replies	Views
Back-shifting non-uniform-sized edge chunks to get constant-sized input to map_blocks Dask Array dask-array	1	336	August 4, 2022
Use map_blocks with function that returns a tuple Dask Array	6	1571	April 14, 2022
Map_blocks unexpected behavior adds rows to dim when specifying chunks Dask Array	2	202	August 3, 2023
Change array shape with map_block function Dask Array	1	142	November 16, 2023
Parallelize or map chunks of arrays with different sizes, shapes and number of blocks Dask Array dask-array	4	636	July 31, 2023

Implementing chunk-wise gufuncs

Related topics