Create an numpy array from dask dataframe

Hi Everyone,
I am totaly new to dask and first post.
I try to create a 2D array from a dask dataframe containing to columns x, y.
in Pandas world, i make a loop to fill the matrice like so:

    def _fill_frame(self, frame: pd.DataFrame):
        image = np.zeros((self._nb_rows, self._nb_cols), dtype=np.uint16)
        for idx in frame.index:
            image[int(frame.loc[idx, "y_px"]), int(frame.loc[idx, "x_px"])] += 1
        return image

How could I accomplish the same job with dask ?
I tried to dasking the previous example

    def _fill_frame_dask(self, frame: pd.DataFrame):
        d_frame = dd.from_pandas(frame, npartitions=4)
        image = np.zeros((self._nb_rows, self._nb_cols), dtype=np.uint16)
        for idx in d_frame.index:
            image[int(d_frame.loc[idx, "y_px"]), int(d_frame.loc[idx, "x_px"])] += 1
        return image

But I don’t know how to compute the DAG and fill the matrix.
Not sure either if this will ever run properly :slight_smile:

Best regards
LF

@ludicludo Welcome! Maybe something like the following:


ddf = dd.DataFrame.from_dict(
    {
        "x": range(10, 20),
        "y": range(20, 30),
    },
    npartitions=2,
)

numpy_array = ddf.to_dask_array().compute()
numpy_array
# Output:
#
# array([[10, 20],
#        [11, 21],
#        [12, 22],
#        [13, 23],
#        [14, 24],
#        [15, 25],
#        [16, 26],
#        [17, 27],
#        [18, 28],
#        [19, 29]])