How to implement groupby sampling in dask?

Hello,

I would like to groupby, and then sample. Just like in pandas, however it seems that sample is not implemented yet. Is there an easy way to complete this using another method?

Pandas documentation: pandas.core.groupby.DataFrameGroupBy.sample — pandas 2.2.2 documentation

Thanks!

Update: Solved via the following,

df.groupby(‘group_variable’).apply(lambda x: x.sample(n=num_samples)).reset_index(drop=True).compute()

1 Like