Hello,
Delayed arguments support has been dropped in new map_partition implementation (daskexpr).
I used to use it in order to share some data across all partitions.
What is the best practice now ?
Thanks for your help,
Regards,
Hello,
Delayed arguments support has been dropped in new map_partition implementation (daskexpr).
I used to use it in order to share some data across all partitions.
What is the best practice now ?
Thanks for your help,
Regards,
Hi @faulaire, welcome to Dask community!
Do you have some link stating this? Or did you just observe the problem in your use case?
Do you think you could build a MVCE for this?
I see you also opened a github issue, and found in the documentation as you say, that this should be possible. So this might be just a bug.
Hi @guillaumeeb !
I just observed the problem in my case, I didn’t find any link stating that the support has been explicitly dropped.
Here is the MVCE:
import dask
from dask.dataframe import from_pandas
import pandas as pd
@dask.delayed
def delayed_input():
return "Hello World !"
def func(df, delayed_obj):
assert delayed_obj == "Hello World !"
return df
df = pd.Series([1., 2., 3.])
dd = from_pandas(df, npartitions=2)
dd2 = dd.map_partitions(func, delayed_input(), meta=df.head(0))
df2 = dd2.compute()
This code works just fine with legacy implementation and raises with daskexpr. In this case delayed_obj in func is still a delayed object.
Thanks for your help,
I’m able to reproduce the issue. But it looks more complicated than I thought.
If I add a .compute() in the func function, it does not work either, and I don’t understand the error.
I will add this information into the issue you opened.