This is a niche case, but I’m sure there’s an obvious route that I just don’t know!
How can I create a dask series from a list of dask expressions without computing?
So say I have something like this:
avg = df["a"].mean()
sum = df["a"].sum()
I can easily make a dask series from a pandas series like so:
pd.Series([sum.compute(), avg.compute()]).pipe(dd.from_pandas, npartitions=1)
But is there a way to generate a dask series from the individual parts without computing the values?
Hvuj
2
yes - it is possible.
though just to fix your code to use best practice here:
compute once and use such as:
sum ,avg = dd.compute(sum,avg)
pd.Series([sum, avg]).pipe(dd.from_pandas, npartitions=1)
[Dask Best Practices — Dask documentation]
now to answer your question:
data = {
'a': [1, 2, 3, 4, 5]
}
ddf = dd.from_pandas(pd.DataFrame(data), npartitions=2)
avg = ddf["a"].mean()
sum = ddf["a"].sum()
dd.concat([avg.to_series(),sum.to_series()]).compute()
That’s awesome, I didn’t realise that to_series
was a method on results. Thanks @Hvuj!
1 Like