Most efficient way to copy from Dask array to Numpy

Jim_Klucar · December 2, 2024, 12:20am

I have a data processing pipeline that I use dask for the beginning but I need to get the data into a large Numpy array for further processing with existing code. The array is around 250k x 25k complex64. I have tried using futures and slices from chunks to parallelize this but this is still slow. Any thoughts on the best way to go from Dask Array to Numpy?

def dask_to_numpy(dask_array, numpy_array):
        future_map = {}
        for _slice in da.core.slices_from_chunks(dask_array.chunks):
            future = client.compute(dask_array[_slice])
            future_map[future] = _slice

        for future in tqdm("Copying chunks to numpy array",
                                       as_completed(future_map.keys()),
                                       total=len(future_map.keys())):
            _slice = future_map[future]
            numpy_array[_slice] = future.result()
            future = None

My chunk size is 1024 x 25k. Maybe that’s too large and is killing performance?

This is also slow:

def dask_to_numpy(dask_array, numpy_array):

        future = dask_array.persist()
        progress(future)
         
        numpy_array = future.result()

What is left out is a routine that pre-allocates the numpy_array as either an array or a memmap depending on array size and available memory.

Jim_Klucar · December 3, 2024, 10:10pm

Answering my own question…

I found dask.array.store()

kjz1997 · December 4, 2024, 1:21am

hello, I have the same problem, but I am new to dask, can you tell me how to use the dask.array.store,I see the document, but I am still confused

Topic		Replies	Views
Simple parallelism with numpy and ctypes functions Dask Array	2	398	October 5, 2022
Why does dask.array.from_array make a copy? Dask Array	4	44	April 11, 2025
Efficient dask array repeat without further rechunking Dask Array dask-array	1	179	August 10, 2023
How to Parallel Saving Many Large Dask Arrays Distributed dask-array , delayed , future , distributed	4	366	January 17, 2023
How to convert a numpy array to a dask array Dask Array	3	210	September 28, 2022

Most efficient way to copy from Dask array to Numpy

Related topics