Making a Series from Dask Expressions

This is a niche case, but I’m sure there’s an obvious route that I just don’t know!

How can I create a dask series from a list of dask expressions without computing?

So say I have something like this:

avg = df["a"].mean()
sum = df["a"].sum()

I can easily make a dask series from a pandas series like so:

pd.Series([sum.compute(), avg.compute()]).pipe(dd.from_pandas, npartitions=1)

But is there a way to generate a dask series from the individual parts without computing the values?

yes - it is possible.

though just to fix your code to use best practice here:

compute once and use such as:

sum ,avg = dd.compute(sum,avg)
pd.Series([sum, avg]).pipe(dd.from_pandas, npartitions=1)

[Dask Best Practices — Dask documentation]

now to answer your question:

data = {
    'a': [1, 2, 3, 4, 5]
}


ddf = dd.from_pandas(pd.DataFrame(data), npartitions=2)

avg = ddf["a"].mean()
sum = ddf["a"].sum()
dd.concat([avg.to_series(),sum.to_series()]).compute()

That’s awesome, I didn’t realise that to_series was a method on results. Thanks @Hvuj!

1 Like