Inconsistent built-in function behavior after groupby

Damilola · June 7, 2025, 1:24am

Hi,

I am running the below code for a sum after a groupby operation

import pandas as pd
import dask.dataframe as dd

# Create the DataFrame
data = {
    'group_col': ['A', 'A', 'B', 'B', 'A', 'C', 'C'],
    'value_col': [3, 1, 5, 2, 4, 7, 6],
    'target_col': [10, 20, 30, 40, 50, 60, 70]
}

pdf = pd.DataFrame(data)

ddf = dd.from_pandas(pdf, npartitions=1)

df_grouped = ddf.groupby('group_col')[['value_col', 'target_col']].sum()

df_grouped.compute()

I get the following output

             value_col	target_col
group_col		
        A	         8          80
        B            7          70
        C	        13	       130

If I perform a slight variation of the sum operation, I still get the same output

import pandas as pd
import dask.dataframe as dd

# Create the DataFrame
data = {
    'group_col': ['A', 'A', 'B', 'B', 'A', 'C', 'C'],
    'value_col': [3, 1, 5, 2, 4, 7, 6],
    'target_col': [10, 20, 30, 40, 50, 60, 70]
}

pdf = pd.DataFrame(data)

ddf = dd.from_pandas(pdf, npartitions=1)

df_grouped = ddf.groupby('group_col').sum()[['value_col', 'target_col']]

df_grouped.compute()

           value_col	target_col
group_col		
A	               8	         80
B	               7	         70
C	              13	        130

If I perform a mean after the groupby operation, I get the below output

import pandas as pd
import dask.dataframe as dd

# Create the DataFrame
data = {
    'group_col': ['A', 'A', 'B', 'B', 'A', 'C', 'C'],
    'value_col': [3, 1, 5, 2, 4, 7, 6],
    'target_col': [10, 20, 30, 40, 50, 60, 70]
}

pdf = pd.DataFrame(data)

ddf = dd.from_pandas(pdf, npartitions=1)

df_grouped = ddf.groupby('group_col')[['value_col', 'target_col']].mean()

df_grouped.compute()

  	        value_col	target_col
group_col		
        A	2.666667	26.666667
        B	3.500000	35.000000
        C	6.500000	65.000000

However, if I perform a slight variation of the mean after the groupby operation as below

import pandas as pd
import dask.dataframe as dd

# Create the DataFrame
data = {
    'group_col': ['A', 'A', 'B', 'B', 'A', 'C', 'C'],
    'value_col': [3, 1, 5, 2, 4, 7, 6],
    'target_col': [10, 20, 30, 40, 50, 60, 70]
}

pdf = pd.DataFrame(data)

ddf = dd.from_pandas(pdf, npartitions=1)

df_grouped = ddf.groupby('group_col').mean()[['value_col', 'target_col']]

df_grouped.compute()

I am getting the below error message

TypeError: agg function failed [how->mean,dtype->object]

Why is there a discrepancy between the sum and the mean built-in functions in this case?

guillaumeeb · June 13, 2025, 3:25pm

Hi @Damilola,

Which version of Dask are you using? I quickly tested your code on a slightly old one (2024.12.1), and did not get any error.

Topic		Replies	Views
Order of .compute() and .groupby() impacting results Dask DataFrame	3	23	March 14, 2025
Custom aggregation of dask dataframe Dask DataFrame	7	533	March 27, 2024
Dask Tutorial dask_delayed what's are they asking here? Dask DataFrame	4	219	May 31, 2023
Multi column groupby gives RuntimeError with latest Dask version 2024.2.1 Distributed	4	154	April 11, 2024
`var` and `std` with ddof in groupby context with other aggregations Dask DataFrame groupby , aggregation	4	41	January 3, 2025

Inconsistent built-in function behavior after groupby

Related topics