Per-worker (i.e., process) numpy array

Hello!

Let’s say I create a new dask array via map_blocks i.e., chunks on the fly. If I wish to pass a numpy array to every process only once to be reused in all chunks (or better yet, although I think not possible, to be shared among the processes), how would I do this?

If I add a kwarg to map_blocks containing the numpy array, does this pass it only once to each worker, and then it is reused across chunks created on that worker? Or will the array be repeatedly passed to each process the generates a new chunk?

Hi @ilan-gold,

I think the correct answer is to Delayed your Numpy array (preferred) or to use client.scatter().

You should then be able to pass the Delayed or Future as an input to your method. This Delayed object will only be loaded once per Worker process.