Inconsistent built-in function behavior after groupby

Hi,

I am running the below code for a sum after a groupby operation

import pandas as pd
import dask.dataframe as dd

# Create the DataFrame
data = {
    'group_col': ['A', 'A', 'B', 'B', 'A', 'C', 'C'],
    'value_col': [3, 1, 5, 2, 4, 7, 6],
    'target_col': [10, 20, 30, 40, 50, 60, 70]
}

pdf = pd.DataFrame(data)

ddf = dd.from_pandas(pdf, npartitions=1)

df_grouped = ddf.groupby('group_col')[['value_col', 'target_col']].sum()

df_grouped.compute()

I get the following output

             value_col	target_col
group_col		
        A	         8          80
        B            7          70
        C	        13	       130

If I perform a slight variation of the sum operation, I still get the same output

import pandas as pd
import dask.dataframe as dd

# Create the DataFrame
data = {
    'group_col': ['A', 'A', 'B', 'B', 'A', 'C', 'C'],
    'value_col': [3, 1, 5, 2, 4, 7, 6],
    'target_col': [10, 20, 30, 40, 50, 60, 70]
}

pdf = pd.DataFrame(data)

ddf = dd.from_pandas(pdf, npartitions=1)

df_grouped = ddf.groupby('group_col').sum()[['value_col', 'target_col']]

df_grouped.compute()
           value_col	target_col
group_col		
A	               8	         80
B	               7	         70
C	              13	        130

If I perform a mean after the groupby operation, I get the below output

import pandas as pd
import dask.dataframe as dd

# Create the DataFrame
data = {
    'group_col': ['A', 'A', 'B', 'B', 'A', 'C', 'C'],
    'value_col': [3, 1, 5, 2, 4, 7, 6],
    'target_col': [10, 20, 30, 40, 50, 60, 70]
}

pdf = pd.DataFrame(data)

ddf = dd.from_pandas(pdf, npartitions=1)

df_grouped = ddf.groupby('group_col')[['value_col', 'target_col']].mean()

df_grouped.compute()
  	        value_col	target_col
group_col		
        A	2.666667	26.666667
        B	3.500000	35.000000
        C	6.500000	65.000000

However, if I perform a slight variation of the mean after the groupby operation as below

import pandas as pd
import dask.dataframe as dd

# Create the DataFrame
data = {
    'group_col': ['A', 'A', 'B', 'B', 'A', 'C', 'C'],
    'value_col': [3, 1, 5, 2, 4, 7, 6],
    'target_col': [10, 20, 30, 40, 50, 60, 70]
}

pdf = pd.DataFrame(data)

ddf = dd.from_pandas(pdf, npartitions=1)

df_grouped = ddf.groupby('group_col').mean()[['value_col', 'target_col']]

df_grouped.compute()

I am getting the below error message

TypeError: agg function failed [how->mean,dtype->object]

Why is there a discrepancy between the sum and the mean built-in functions in this case?

Hi @Damilola,

Which version of Dask are you using? I quickly tested your code on a slightly old one (2024.12.1), and did not get any error.