Zarr chunks locality

orliac · November 27, 2025, 7:08am

Hello. I’m seeking advice on how to control data locality when reading chunks from a zarr file. That is, I want to express the fact that all chunks to be processed by (distributed) worker i are loaded by worker i so that no transfers should occur between workers when concatenating after the getitem-* stage.
I put in place a scheduler plugin that seems to do the right thing, but looking at the task dashboard I still don’t see each worker loading each 8 chuncks.

2025-11-26 13:54:48,664 - distributed.scheduler - INFO - [GetitemPlacementPlugin] transition: ('getitem-9f5e00374ed64376acacd763a6019e3f', 1, 1, 1, 0) -> ucx://10.91.54.39:60156
2025-11-26 13:54:48,665 - distributed.scheduler - INFO - [GetitemPlacementPlugin] transition: ('getitem-9f5e00374ed64376acacd763a6019e3f', 1, 1, 0, 0) -> ucx://10.91.54.39:60156
2025-11-26 13:54:48,665 - distributed.scheduler - INFO - [GetitemPlacementPlugin] transition: ('getitem-9f5e00374ed64376acacd763a6019e3f', 1, 3, 1, 0) -> ucx://10.91.54.39:60156
2025-11-26 13:54:48,665 - distributed.scheduler - INFO - [GetitemPlacementPlugin] transition: ('getitem-9f5e00374ed64376acacd763a6019e3f', 1, 3, 0, 0) -> ucx://10.91.54.39:60156
2025-11-26 13:54:48,682 - distributed.scheduler - INFO - [GetitemPlacementPlugin] transition: ('getitem-9f5e00374ed64376acacd763a6019e3f', 1, 0, 0, 0) -> ucx://10.91.54.39:60156
2025-11-26 13:54:48,682 - distributed.scheduler - INFO - [GetitemPlacementPlugin] transition: ('getitem-9f5e00374ed64376acacd763a6019e3f', 1, 2, 1, 0) -> ucx://10.91.54.39:60156
2025-11-26 13:54:48,682 - distributed.scheduler - INFO - [GetitemPlacementPlugin] transition: ('getitem-9f5e00374ed64376acacd763a6019e3f', 1, 2, 0, 0) -> ucx://10.91.54.39:60156
2025-11-26 13:54:48,682 - distributed.scheduler - INFO - [GetitemPlacementPlugin] transition: ('getitem-9f5e00374ed64376acacd763a6019e3f', 1, 0, 1, 0) -> ucx://10.91.54.39:60156
2025-11-26 13:54:48,682 - distributed.scheduler - INFO - [GetitemPlacementPlugin] transition: ('getitem-9f5e00374ed64376acacd763a6019e3f', 0, 0, 0, 0) -> ucx://10.91.54.33:39796
2025-11-26 13:54:48,682 - distributed.scheduler - INFO - [GetitemPlacementPlugin] transition: ('getitem-9f5e00374ed64376acacd763a6019e3f', 0, 3, 0, 0) -> ucx://10.91.54.33:39796
2025-11-26 13:54:48,682 - distributed.scheduler - INFO - [GetitemPlacementPlugin] transition: ('getitem-9f5e00374ed64376acacd763a6019e3f', 0, 2, 0, 0) -> ucx://10.91.54.33:39796
2025-11-26 13:54:48,682 - distributed.scheduler - INFO - [GetitemPlacementPlugin] transition: ('getitem-9f5e00374ed64376acacd763a6019e3f', 0, 3, 1, 0) -> ucx://10.91.54.33:39796
2025-11-26 13:54:48,682 - distributed.scheduler - INFO - [GetitemPlacementPlugin] transition: ('getitem-9f5e00374ed64376acacd763a6019e3f', 0, 2, 1, 0) -> ucx://10.91.54.33:39796
2025-11-26 13:54:48,682 - distributed.scheduler - INFO - [GetitemPlacementPlugin] transition: ('getitem-9f5e00374ed64376acacd763a6019e3f', 0, 1, 0, 0) -> ucx://10.91.54.33:39796
2025-11-26 13:54:48,682 - distributed.scheduler - INFO - [GetitemPlacementPlugin] transition: ('getitem-9f5e00374ed64376acacd763a6019e3f', 0, 1, 1, 0) -> ucx://10.91.54.33:39796
2025-11-26 13:54:48,683 - distributed.scheduler - INFO - [GetitemPlacementPlugin] transition: ('getitem-9f5e00374ed64376acacd763a6019e3f', 0, 0, 1, 0) -> ucx://10.91.54.33:39796

Here I’d like to constrain the first 8 getitem to worker 0, and the 8 following to worker 1 (mapping based on the value of the first dimension of the data array to be processed in my case)

Is there a way to impose such data locality hard constraints?
Thank you.

Topic		Replies	Views
Understanding Work Stealing Distributed	14	4608	March 10, 2022
Scheduling issues on SLURMCluster for transforming large arrays Distributed dask-jobqueue , scheduler	6	794	October 19, 2022
Best practice to distribute Distributed distributed	5	598	January 11, 2022
Dask data sharding future	9	611	January 25, 2022
Help with dask/zarr usage -- performance issues with dask Dask Array zarr	10	2015	January 17, 2023

Zarr chunks locality

Related topics