Hi,
I have a Dask Distributed Cluster composed of many dask workers deployed in Docker containers. In some workflow, I’m doing multiple XGBoost prediction (XGBoost.dask.predict) pre-scattering the trained model with broadcast=True (as suggested here), through the scatter API.
This worked fine for many many runs executed as Prefect Flow.
However, sometimes it hangs indefinetely in this scattering operation, I believe and I find also this issue . So, I thought to avoid model pre-scattering to work-around it, but XGBoost.dask.predict will do it inside.
Because of this could be quite hard to reproduce and share, I tried same approach just on a numpy array in a Juptyter Notebook using the same Dask Cluster:
import numpy as np
from dask.distributed import Client
client = Client("tcp://my_cluster:8786")
b=np.arange(1000000)
b_f = client.scatter(b, broadcast=True)
Just after the scatter call, I see on the dashboard ONE ‘ndarray’ task in the Graph, but the call keeps going and returns after many many minutes, like 30 or 60, and the dashboard still shows the same single task.
In the docs the scatter API has a ‘timeout’ parameter, as the number of seconds to wait until a TimeoutError raise, so I tried this:
b_f = client.scatter(b, broadcast=True, timeout=30)
but still same behavior and keeps going even after 30 seconds.
Here my versions:
dask = “2023.3.2”
distributed = “2023.3.2.1”