Question about Dask array reduction and dask_image ndmeasure

Vincent · December 1, 2025, 8:31am

Hello everyone, I have been working on large 3D image processing with dask and dask_image. The default dask_image ndmeasure functions worked for me, though loading data into memory is a major bottleneck.

Changing to a strategy with dask.array.reduction was the outcome. Effectively, I use it to measure multiple things for all objects present in a chunk and then combine these. Jointly computing measures can even help speed-up and most measures reduce well (min/max/histogram/bounding_box etc.). The split_every argument even enables spatially logical reduction (e.g. in a 2x2x2 manner).

Some measures do not play nice with this strategy (like a median). In theory, a list of all intensity values per object would be possible if memory allows it. In cases it does not, writing to disk [and (re-)chunking] would have to be the fall-back option.

Now my questions:

Is there a reason that ndmeasure loads/unloads a chunk separately for all objects/indices?
For operations that could exceed memory, what would be a good strategy?*
Memory for tasks that store all intensity values / positions can be structured easily. E.g. if I go over all chunks once, I can extract that chunk 1 contains the first X positions of object A. chunk 2 contains positions X to X+Y and so on. In a second pass I could then write values in such a manner. Is there a dask friendly way to achieve this?

*The best I could find was this old blogpost, but the authors of the chest package state that it is not multi-process safe.

Vincent · December 1, 2025, 9:00pm

One new realization to potentially resolve the problems above. Measuring object sizes is quite trivial with a dask array reduction implementation. Once those are computed, one can aggregate object positions from multiple chunks with the combine argument.

If the set of positions matches the object size, it is complete and can already be written to disk (or even passed to another worker?), while the remaining incomplete objects are passed along.

Any suggestions to do this in a dask-like manner are welcome!

Topic		Replies	Views
How to improve the processing speed Dask Array dask-array	5	130	June 11, 2026
Parallelize or map chunks of arrays with different sizes, shapes and number of blocks Dask Array dask-array	4	692	July 31, 2023
How to properly use Dask delayed on a function that calls other functions Deploying Dask delayed	11	486	August 13, 2023
Using dask to rescale a large numpy array	1	465	June 2, 2023
Prevent dask array from `compute()` behavior Dask Array dask-array	9	998	March 19, 2022

Question about Dask array reduction and dask_image ndmeasure

Related topics