How do get values as they're available on a map_blocks call

odinsbane · March 5, 2026, 10:46am

I’m trying to process a large chunk of data using map_blocks but I don’t know how to access the data as it is read while processing in parallel.

out = dask.array.map_blocks( torchit, dtype="float32", chunks = chunks )
print("processing")
for err in out:
    start = time.time()
    val = err.compute()
    print( (time.time() - start), "batch complete" )

If I do it this way, then each chunks gets computed one at a time. I think they can be performed in parallel though. I am not dead set on using map_blocks maybe I am using the wrong method to begin with.

odinsbane · March 6, 2026, 7:49am

For a solution to this problem, I used pythons futures.concurrent.

ex = concurrent.futures.ThreadPoolExecutor()
L = [ ex.submit( torchit, block_id ) for block id in range( n ) ]
for future in L:
    val = future.result()
ex.shutdown()

I thought it would be nice to stick with dask, and I think I could have managed using dask.distributed with a Client and Future. I didn’t see the advantage over just using concurrent.futures.

Topic		Replies	Views
Computing chunks locally before sending to workers with map_blocks Distributed	1	58	July 18, 2024
Parallelize or map chunks of arrays with different sizes, shapes and number of blocks Dask Array dask-array	4	688	July 31, 2023
How to scatter when using map_blocks() Distributed dask-array	0	19	April 16, 2026
Prevent dask array from `compute()` behavior Dask Array dask-array	9	994	March 19, 2022
How to properly extract features of a large array on cluster? Dask Array dask-array , distributed	1	294	November 9, 2022

How do get values as they're available on a map_blocks call

Related topics