def function(ddf: dask.dataframe, my_dict: dict):
for key in my_dict.keys():
ddf['Column A'] = ddf['Column A'].apply(lambda x: x.replace(x, key) if x in my_dict[key] else x)
return ddf
I rewrote the lambda function to a regular function (def function:), wrapped it with @dask.delayed, and then put it into .apply(), but it didn’t work. How do I parallel process this function?
Not sure if I got your problem well, but using apply on a Dask DataFrame is parallelized.
What I mean is, if you chain several apply calls on the same DataFrame, the calls will be chained and applied in parallel on each partition of the DataFrame. You don’t need dask.delayed.
Could you be a bit more precise about what didn’t work? The best would be to have a reproducible example.
Note, I see you’re not using the axis=1 kwarg which is mandatory to use apply with Dask.