Is it possible to use custom Aggregation over entire ddf/column?

JacobHayes · May 2, 2022, 6:16pm

Hi, is it possible to use custom Aggregations across an entire dataframe or column? They work with ddf.groupby(...).agg(...), but pandas also supports df.agg(...) and dask even supports full aggregations with things like ddf.max().

Looking into the code for ddf.max(), it appears to use the undocumented ddf.reduction function (via _reduction_agg), which appears very close (chunk->chunk, agg->aggregate, finalize->combine?), but the parameters aren’t exactly compatible (eg: Aggregation gets SeriesGroupBy objects while .reduction funcs get Series objects; aggregate and combine are supposed to return the same output, etc). I may be able to work around these, but just wondering if this is the right approach. Thanks!

pavithraes · May 9, 2022, 3:55pm

@JacobHayes Welcome to Discourse!

I don’t think this is implemented yet, but it’s a reasonable feature to support.

There is an open feature request for Series here: Suggestion: Series.agg · Issue #3527 · dask/dask · GitHub, but I couldn’t find one for DataFrame. Please feel free to open one or chime in on the Series issue, and let us know if you’d like to work on this!

Topic		Replies	Views
Custom aggregation of dask dataframe Dask DataFrame	7	534	March 27, 2024
dataframe.groupby.Aggregation has dataframe populated with foo or 1 Dask DataFrame aggregation	0	188	November 11, 2022
Applying custom aggregation on rolling Dask DataFrame	1	131	January 11, 2024
ValueError: If using all scalar values, you must pass an index error message during aggregation of a Dask Dataframe using custom functions Dask DataFrame	3	116	November 1, 2024
P99 custom aggregation Dask DataFrame	1	157	August 28, 2023

Is it possible to use custom Aggregation over entire ddf/column?

Related topics