How to convert a numpy array to a dask array

halehawk · September 13, 2022, 10:21pm

I declared one numpy array “A” on four MPI processes at once, and I saved different data to the numpy array on different processes, do you know how I can convert this numpy array to a dask array immediately? Or I have to create another numpy array “B” and gather all the data to “B” in one process, then I can convert “B” to a dask array. Thanks!

myarbrou-rh · September 15, 2022, 7:03pm

There’s a from_array option dask.array.from_array — Dask documentation

halehawk · September 20, 2022, 4:48pm

This from_array doesn’t work, it still used data from one process, not from all 4 processes.

guillaumeeb · September 28, 2022, 12:11pm

Hi @halehawk,

You’re talking about MPI. Did you used Dask-mpi to start a Dask Cluster on top of this MPI program?

It is not easy to answer your problem currently, we’re lacking some inputs about your workflow. Could you try to describe it more in detail?

From what I understand, I would recommend to either:

Just use Dask cluster from the start: create your initial Numpy array through delayed API, and then generate an Dask array from there.
If you really need MPI, then for each process, write a temporary array on disk, and then start a Dask cluster and read the Data and concatenate the arrays on a big Dask array.

Topic		Replies	Views
Most efficient way to copy from Dask array to Numpy Dask Array dask-array	2	54	December 4, 2024
Why does dask.array.from_array make a copy? Dask Array	4	43	April 11, 2025
How to Parallel Saving Many Large Dask Arrays Distributed dask-array , delayed , future , distributed	4	362	January 17, 2023
Per-worker (i.e., process) numpy array Distributed dask-array	1	18	June 6, 2025
Create an numpy array from dask dataframe Dask DataFrame	1	1645	August 31, 2022

How to convert a numpy array to a dask array

Related topics