How to explode a dask dataframe column horizontally instead of vertically

Damilola · November 22, 2024, 7:47am

I am trying to emulate the below behavior using Dask Dataframe to explode the columns horizontally instead of vertically

I am able to achieve it using the below code in Pandas dataframe

import pandas as pd

data = {‘list_column’: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]}

df = pd.DataFrame(data)

df_exploded = df.explode(‘list_column’)

df_horizontal = df_exploded.pivot_table(index=df_exploded.index, columns=df_exploded.groupby(df_exploded.index).cumcount(), values=‘list_column’)

df_horizontal

guillaumeeb · November 22, 2024, 1:19pm

Hi @Damilola,

If you know before hand the number of int in the list, you could go with map_partitions:

import dask
import pandas as pd
import dask.dataframe as ddf
dask.config.set({"dataframe.convert-string": False})

data = {'list_column': [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18]]}
df = pd.DataFrame(data)
dask_df = ddf.from_pandas(df, npartitions=3)

def explode_horizontal(df):
    df_exploded = df.explode('list_column')
    df_horizontal = df_exploded.pivot_table(index=df_exploded.index, columns=df_exploded.groupby(df_exploded.index).cumcount(), values='list_column')
    return df_horizontal

result = dask_df.map_partitions(explode_horizontal, meta={0:'int', 1: 'int', 2: 'int'})#meta is key here

result.compute()

Damilola · November 22, 2024, 8:44pm

Hi @guillaumeeb

Thanks for the response.

It appears you are exploding the Pandas dataframe and not the Dask dataframe.

I also want a solution that will work if the length of the list values is arbitrary or not known in advance as that could impact how the meta parameter is constructed

Please note that all list entries in a column will have the same length. However, this length can vary and cannot always be deterministic in advance.

This is the current Pandas dataframe that works

Pandas DataFrame Approach

import pandas as pd

data = {‘list_column’: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]}

df = pd.DataFrame(data)

df_exploded = df.explode(‘list_column’)

df_horizontal = df_exploded.pivot_table(index=df_exploded.index, columns=df_exploded.groupby(df_exploded.index).cumcount(), values=‘list_column’)

df_horizontal

However, from the pure Dask Dataframe point of view, this is what I am actually looking for which I have not been able to get to work as I am always getting a ValueError: Grouper and axis must be same length

Dask DataFrame Approach

import pandas as pd
import dask.dataframe as dd

data = {‘list_column’: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]}

pdf = pd.DataFrame(data)

ddf = dd.from_pandas(pdf, npartitions=1)

ddf_exploded = ddf.explode(‘list_column’)

ddf_horizontal = ddf_exploded.pivot_table(index=ddf_exploded.index, columns=ddf_exploded.groupby(df_exploded.index).cumcount(), values=‘list_column’)

ddf_horizontal.compute()

Thank you so much for your help in this regard.

guillaumeeb · November 27, 2024, 9:36am

Yes I am, because I’m using the Pandas method on each partition of the Dask DataFrame, using map_partitions.

I understand, but I don’t think this is possible, you’ve got to specify meta kwarg to Dask. Could you just read a few lines of data to determine the length of the list values to build the meta kwarg?

Topic		Replies	Views
Turn an array column in a dask dataframe into multiple columns Dask DataFrame	0	273	August 31, 2022
How to append to a Dask Dataframe Dask DataFrame	11	3078	March 14, 2023
Pivot_table doesnt work same as pandas Dask DataFrame	1	380	May 30, 2023
How to get the maximum value from dask dataframe column of list values Dask DataFrame	2	43	April 27, 2025
Create an numpy array from dask dataframe Dask DataFrame	1	1645	August 31, 2022

How to explode a dask dataframe column horizontally instead of vertically

Pandas DataFrame Approach

Dask DataFrame Approach

Related topics