Client.scatter with broadcast behaving unexpectedly

matteosimone · December 20, 2023, 11:38am

Hi,

I have a Dask Distributed Cluster composed of many dask workers deployed in Docker containers. In some workflow, I’m doing multiple XGBoost prediction (XGBoost.dask.predict) pre-scattering the trained model with broadcast=True (as suggested here), through the scatter API.
This worked fine for many many runs executed as Prefect Flow.
However, sometimes it hangs indefinetely in this scattering operation, I believe and I find also this issue . So, I thought to avoid model pre-scattering to work-around it, but XGBoost.dask.predict will do it inside.

Because of this could be quite hard to reproduce and share, I tried same approach just on a numpy array in a Juptyter Notebook using the same Dask Cluster:

import numpy as np
from dask.distributed import Client
client = Client("tcp://my_cluster:8786") 

b=np.arange(1000000)
b_f = client.scatter(b, broadcast=True)

Just after the scatter call, I see on the dashboard ONE ‘ndarray’ task in the Graph, but the call keeps going and returns after many many minutes, like 30 or 60, and the dashboard still shows the same single task.

In the docs the scatter API has a ‘timeout’ parameter, as the number of seconds to wait until a TimeoutError raise, so I tried this:

b_f = client.scatter(b, broadcast=True, timeout=30)

but still same behavior and keeps going even after 30 seconds.

Here my versions:
dask = “2023.3.2”
distributed = “2023.3.2.1”

guillaumeeb · December 22, 2023, 2:44pm

Hi @matteosimone,

Do you always get this behavior with the code sample you gave?

I just tried it on a LocalCluster with dask 2023.6.0 and couldn’t reproduce.

Topic		Replies	Views
Client.scatter() producing uneven results Distributed distributed	1	902	August 12, 2022
Workers do not keep data from `client.scatter(..., broadcast=True)` Distributed future , distributed	1	54	November 15, 2024
Can't connect to local cluster - times out Distributed	5	2767	December 11, 2021
When using scatter and memory_limit, the dask client will crash if any future goes above the limit. Does not happen when using just one or the other. How can I use both while also canceling futures that fail, without crashing the entire client? Distributed future , distributed	4	1011	July 21, 2023
Worker Connection timeout while client creation Distributed distributed	1	202	January 16, 2023

Client.scatter with broadcast behaving unexpectedly

Related topics