Hi,
I am running the below code for a sum after a groupby operation
import pandas as pd
import dask.dataframe as dd
# Create the DataFrame
data = {
'group_col': ['A', 'A', 'B', 'B', 'A', 'C', 'C'],
'value_col': [3, 1, 5, 2, 4, 7, 6],
'target_col': [10, 20, 30, 40, 50, 60, 70]
}
pdf = pd.DataFrame(data)
ddf = dd.from_pandas(pdf, npartitions=1)
df_grouped = ddf.groupby('group_col')[['value_col', 'target_col']].sum()
df_grouped.compute()
I get the following output
value_col target_col
group_col
A 8 80
B 7 70
C 13 130
If I perform a slight variation of the sum operation, I still get the same output
import pandas as pd
import dask.dataframe as dd
# Create the DataFrame
data = {
'group_col': ['A', 'A', 'B', 'B', 'A', 'C', 'C'],
'value_col': [3, 1, 5, 2, 4, 7, 6],
'target_col': [10, 20, 30, 40, 50, 60, 70]
}
pdf = pd.DataFrame(data)
ddf = dd.from_pandas(pdf, npartitions=1)
df_grouped = ddf.groupby('group_col').sum()[['value_col', 'target_col']]
df_grouped.compute()
value_col target_col
group_col
A 8 80
B 7 70
C 13 130
If I perform a mean after the groupby operation, I get the below output
import pandas as pd
import dask.dataframe as dd
# Create the DataFrame
data = {
'group_col': ['A', 'A', 'B', 'B', 'A', 'C', 'C'],
'value_col': [3, 1, 5, 2, 4, 7, 6],
'target_col': [10, 20, 30, 40, 50, 60, 70]
}
pdf = pd.DataFrame(data)
ddf = dd.from_pandas(pdf, npartitions=1)
df_grouped = ddf.groupby('group_col')[['value_col', 'target_col']].mean()
df_grouped.compute()
value_col target_col
group_col
A 2.666667 26.666667
B 3.500000 35.000000
C 6.500000 65.000000
However, if I perform a slight variation of the mean after the groupby operation as below
import pandas as pd
import dask.dataframe as dd
# Create the DataFrame
data = {
'group_col': ['A', 'A', 'B', 'B', 'A', 'C', 'C'],
'value_col': [3, 1, 5, 2, 4, 7, 6],
'target_col': [10, 20, 30, 40, 50, 60, 70]
}
pdf = pd.DataFrame(data)
ddf = dd.from_pandas(pdf, npartitions=1)
df_grouped = ddf.groupby('group_col').mean()[['value_col', 'target_col']]
df_grouped.compute()
I am getting the below error message
TypeError: agg function failed [how->mean,dtype->object]
Why is there a discrepancy between the sum and the mean built-in functions in this case?