Heavy object distribution among the worker pool

I am trying to design a mechanism to distribute an asset (e.g. a model) across a large worker pool.
Directly providing the heavy object as part of the argument list to Map operation is leading to significant delays as the scheduler seems to be distributing this to all the workers. Alternatives like centralized remote storage accessible to all the workers can quickly become the bottle necks for large scale worker pools.

I was curious if anyone has attempted mechanisms like distributing the object among the workers using a topology like a spanning tree, where the workers who received the asset can start serving it to other workers. If not would any of you have suggestions for implementing such a distribution mechanism.


NOT A CONTRIBUTION

The problem with this approach (depending on what you really do) is that your object will be put inside the task graphs, which can make things really bad.

Have you tried to look at scatter function? There is also some documentation around it: Futures — Dask documentation.

This useful Stackoverflow answer details my answer a bit.

I’m not sure about the way data moves between worker once it’s scattered to at least one, but I would say it doesn’t go through the Scheduler anymore.