How can I recreate a reference to a scattered object, without rescattering?

Lets say, from a jupyter-notebook, one scatter an object to a cluster of workers, ( using the GatewayCluster in my case ). Followed by shutting down, and restarting the jupyter-kernel. I can reconnect to the cluster, and see the scattered object still being present from the dashboard.

Is it also possible to recreate the reference to the object?

Something like:

from dask_gateway import GatewayCluster, Gateway 
cluster = GatewayCluster()
from distributed import Client
client = Client(cluster)
n_workers = 4
cluster.scale(n_workers)
client.wait_for_workers(n_workers)
data_scattered = client.scatter(np.empty((10,)))

followed by restarting the local jupyter-kernel, and running the following

from dask_gateway import GatewayCluster, Gateway 
g = Gateway()
cluster = g.connect(g.list_clusters()[0].name)
from distributed import Client
client = Client(cluster)
### Recreate reference to data_scattered here?

Best,

When your client quits, the scattered data will no longer be “wanted” by any one, and will be cleaned up. You may want to “publish” it to make sure it survives Publish Datasets — Dask.distributed 2024.3.1 documentation

1 Like

Okay, great, that looks interesting. I will check it out! :+1: