Client.scatter() producing uneven results

danajobox · August 11, 2022, 9:56pm

Hi, I’m running dask distributed on an existing ECS cluster, and I’m running into issues with client.scatter() not distributing the workload evenly. Versions are 2022.7.1 on the workers, 2022.8.0 on the scheduler, and 2022.6.0 on the client.

I have 3 workers running, and the client sees this (len(client.nthreads()) is 3). Yet when I run

data = ['a', 'b', 'c']
future = client.scatter(data)
client.who_has()

I get

client.rebalance() does nothing, and the only way I’ve found to get past this is to broadcast to all workers. But that’s causing issues with memory, since the real objects are quite large. Is there any way I can force scatter to send one item to each worker?

PGijsbers · August 12, 2022, 1:45pm

Disclaimer: I’m not a maintainer.

Do the worker nodes have multiple cores? From the documentation on data locality:

When a user scatters data from their local process to the distributed network this data is distributed in a round-robin fashion grouping by number of cores.

You can always manually specify the worker to scatter to with the worker parameter of scatter. Refer to the data locality documentation and the API docs).
If your workers only have a single core each, this would seem like a bug to me.

Topic		Replies	Views
Workers do not keep data from `client.scatter(..., broadcast=True)` Distributed future , distributed	1	59	November 15, 2024
How to efficiently distribute load with worker nodes? Distributed distributed	2	240	August 31, 2022
Client.scatter with broadcast behaving unexpectedly Distributed	1	151	December 22, 2023
How to get each worker to process only one task when using SGECluster Distributed	11	676	September 27, 2023
How to initialise a ssh cluster the right way and distribute tasks evenly Distributed	4	587	June 26, 2022

Client.scatter() producing uneven results

Related topics