Hi, is it possible to use custom Aggregation
s across an entire dataframe or column? They work with ddf.groupby(...).agg(...)
, but pandas also supports df.agg(...)
and dask even supports full aggregations with things like ddf.max()
.
Looking into the code for ddf.max()
, it appears to use the undocumented ddf.reduction
function (via _reduction_agg
), which appears very close (chunk->chunk, agg->aggregate, finalize->combine?), but the parameters aren’t exactly compatible (eg: Aggregation
gets SeriesGroupBy
objects while .reduction
funcs get Series
objects; aggregate
and combine
are supposed to return the same output, etc). I may be able to work around these, but just wondering if this is the right approach. Thanks!