Delayed argument in map_partition

Hello,

Delayed arguments support has been dropped in new map_partition implementation (daskexpr).
I used to use it in order to share some data across all partitions.

What is the best practice now ?

Thanks for your help,

Regards,

Hi @faulaire, welcome to Dask community!

Do you have some link stating this? Or did you just observe the problem in your use case?

Do you think you could build a MVCE for this?

I see you also opened a github issue, and found in the documentation as you say, that this should be possible. So this might be just a bug.

Hi @guillaumeeb !

I just observed the problem in my case, I didn’t find any link stating that the support has been explicitly dropped.

Here is the MVCE:

import dask
from dask.dataframe import from_pandas
import pandas as pd

@dask.delayed
def delayed_input():
    return "Hello World !"

def func(df, delayed_obj):
    assert delayed_obj == "Hello World !"
    return df

df = pd.Series([1., 2., 3.])
dd = from_pandas(df, npartitions=2)

dd2 = dd.map_partitions(func, delayed_input(), meta=df.head(0))
df2 = dd2.compute()

This code works just fine with legacy implementation and raises with daskexpr. In this case delayed_obj in func is still a delayed object.

Thanks for your help,

I’m able to reproduce the issue. But it looks more complicated than I thought.

If I add a .compute() in the func function, it does not work either, and I don’t understand the error.

I will add this information into the issue you opened.

Fixed by Map_partitions again accepts delayed objects by fjetter · Pull Request #11907 · dask/dask · GitHub

1 Like