I declared one numpy array “A” on four MPI processes at once, and I saved different data to the numpy array on different processes, do you know how I can convert this numpy array to a dask array immediately? Or I have to create another numpy array “B” and gather all the data to “B” in one process, then I can convert “B” to a dask array. Thanks!
There’s a from_array option dask.array.from_array — Dask documentation
This from_array doesn’t work, it still used data from one process, not from all 4 processes.
You’re talking about MPI. Did you used Dask-mpi to start a Dask Cluster on top of this MPI program?
It is not easy to answer your problem currently, we’re lacking some inputs about your workflow. Could you try to describe it more in detail?
From what I understand, I would recommend to either:
- Just use Dask cluster from the start: create your initial Numpy array through
delayedAPI, and then generate an Dask array from there.
- If you really need MPI, then for each process, write a temporary array on disk, and then start a Dask cluster and read the Data and concatenate the arrays on a big Dask array.