How to get the original i,j,k location in blockwise operation

If I have an operation like dask_array_object.blocks and iterate over the blocks resulting from that, how can I get the original i,j,k location of each part inside of the function that is executing in a blockwise fashion?

Also, I noticed that there is an option of using the function dask.array.blockwise() for doing blockwise operations, same question. How can you know the original i,j,k location of the part in the dask array inside of the function executing in a blockwise fashion?

@gavargas22 Welcome to Discourse! I think you can check out Dask Array’s map_blocks, it includes block_id and block_info keyword arguments that store the chunk location and more information respectively:

import dask.array as da

x = da.random.randint(100, size=(10,10), chunks=(5,5))

def func(x, block_id=None, block_info=None):
    print ("block_id = ", block_id)
    print ("block_info = ", block_info)
    return x+1

da.map_blocks(func, x).compute()

Thank you for your response!

That’s a good solution, I tried this, but the function that I am executing with map_blocks is a function that writes to a file on disk in parallel and in your example I return x (which is the block); so when the whole operation completes, I get a full sized numpy array.

Is there a way to return something that is empty, and does not occupy the space? If so, map_blocks would be perfect for me

My data is a 300 GB dask array, and I just want to take the values of each block and write them into a file

I am atempting something like this:

import dask.array as da

x = da.random.randint(100, size=(2000,2000,2000)))

def func(x, block_id=None, block_info=None):
    # Grab the values of the 3D cube from Zarr disk store
    block_data = x.compute()
    # Function that writes the actual values to disk
    write_value_to_binary(block_data, "./file/datafile.bin")
    # Attempt to release the memory?

    return x

da.map_blocks(func, x).compute()

Hi @gavargas22! In your snippet you mention you’re reading the array from Zarr, in which case you can use dask.array.from_zarr and dask.array.to_zarr, without necessarily needing to call compute. There are more details here, but you can do something like:

import dask.array as da

# save zarr file
x = da.random.randint(100, size=(10,10), chunks=(5,5))
# read it in
y = da.from_zarr('test.zarr')
# do some stuff
z = y[::2, 5000:].mean(axis=1)
# save result
1 Like

@gavargas22 it looks like you also posted this question to stack overflow, where there is another answer– do any of these address your question?

Thank you for your responses. It partially answers my question.

It seems that I can return the smallest array I can on each iteration of map_blocks and solve my problem.

But I also have the question of: When you use .blocks.ravel() How do you reconstruct the original shape and block arrangement into another numpy-like object.

I want to take each block from .blocks.ravel() do some computation and then write to a binary file at the specific i,j,k locations where the block is supposed to be.

Essentially, I am writing a binary file in a piecewise fashion.

Hi @gavargas22

It turns out numpy is quite strict about what one can and cannot be converted into an array. Essentially an object must be “array-like” to be able to get converted. .blocks.ravel() is not, since its items have different shapes in general.

You could however spoof numpy into thinking .blocks.ravel() is array-like by wrapping each of its items with an arbitrary non-array-like object:

import numpy as np
import dask.array as da

class wrapped():
    def __init__(self, block):
        self.view = block

# an example dask array `x` of arbitrary shape and chunks:
x = da.from_array(np.arange(4*3*5).reshape((4, 3, 5)), chunks=2)
x_block_list = x.blocks.ravel()
blocks = np.array([wrapped(block) for block in x_block_list])

which can then be reshaped into the desired structure:

block_array = blocks.reshape(x.numblocks)

whose elements can be referenced with block indices. For example,

block_array[1, 0, 2].view.compute()

But I find the above approach somewhat unwieldy, since the easiest way to directly reference a block of a dask array x (if that’s what you really want) is to do:

x.dask[(, 1, 0, 2)]