Is there a way to specify affect the order `map_blocks` function

I’m using map blocks to process data and save a zarr file lazily. Is there a way to process the values in order? Currently as it writes chunks, some chunks are early and some chunks are late. It would be nice if they were processed in order.

out = dask.array.map_blocks( torchit, dtype="uint16", chunks = chunks, meta=numpy.array((), dtype="int16"))

It’s not clear from the documentation but it seems like both dtype and meta are redundant in this example? Should I prefer one or the other?

The saved values are often really late in the series. As in I have 200 frames to process, and it will process 0, 10, 40 and 150 before 1 or 2 has been written.

Thanks

Hi @odinsbane, welcome to Dask community!

Depending on your workflow, it might not be simple, do you have more details of the operations you perform?

You can see in the documentation that you can visualize tasks priority from your tasks graph. This page also mention the kwarg inline_array=True that might be useful.

In simple case dtype is enough, but meta might be needed for other.

And if the above does not give you a good solution, you might want to play with priority, as described in the example on this page:

with dask.annotate(priority=lambda k: k[1]*nblocks[1] + k[2]):
    A = da.ones((1000, 1000), chunks=(100, 100))

Thanks for the response!

As for my workflow, I am using ngff_zarr, to load a zarr image and then processing the image by converting chunks to numpy arrays and processing that with cellpose.

Essentially my work flow looks like.

dask_array = ngff_zarr.load_my_zarr_file()
def process( block_id, data=dask_array ):
    y = cellpose_model.eval( numpy.array( data[block_id[0] ) )
    return y
out = dask.array.map_blocks( process, dtype="uint16", chunks=chunks )
ngff_zarr.save_out_as_zarr()

Can I use inline? It seems like I cannot because it would need to happen at the point of dask array creation. It seems to be a similar situation with annotate, although maybe I could use annotate, then load the zarr file.

Another alternative would be to load the zarr with dask.array.from_zarr I could use ngff_zarr to handle the metadata, then use from_zarr to load the same backing array but with an inline processing order.

In simple case dtype is enough, but meta might be needed for other.

Good to know.

Inline would probably not work here.

Annotate should work though, with the proper with syntax.

I’m not familiar with ngff_zarr, but this proposition, or digging a little into possible options you can give to ngff_zarr can be another lead.

I’m not familiar with ngff_zarr, but this proposition, or digging a little into possible options you can give to ngff_zarr can be another lead.

I looked but didn’t see anything obvious. I tried loading the dask array using the inline argument, but it didn’t change the order that things get written.

Looking at the map_blocks, I don’t know how it would know to change the order based on the dask array provided. I guess it sets up a whole computational graph and goes to work, so if object in the graph demands a particular order then that is the order it will work from.

Did you try using the visualize code from the documentation link to see how Dask intend to process your graph?

Would you be able to create a reproducer?

If I include visualize, I get wide graph.

This is the code I am using, it makes a [cellpose prediction](zarr-recipes/src/scripts/predict_cellpose-2.py at master · Living-Technologies/zarr-recipes · GitHub) from a zarr file.

Could you try to use visualize with color="order" kwarg in order to see how Dask intend to process the graph?