How to parallel process .apply with a lambda function within a for loop?

farmd · February 17, 2023, 9:57pm

I try to parallel process the function below.

def function(ddf: dask.dataframe, my_dict: dict):
    for key in my_dict.keys():
        ddf['Column A'] = ddf['Column A'].apply(lambda x: x.replace(x, key) if x in my_dict[key] else x)
    return ddf

I rewrote the lambda function to a regular function (def function:), wrapped it with @dask.delayed, and then put it into .apply(), but it didn’t work. How do I parallel process this function?

guillaumeeb · February 18, 2023, 12:38pm

Hi @farmd,

Not sure if I got your problem well, but using apply on a Dask DataFrame is parallelized.

What I mean is, if you chain several apply calls on the same DataFrame, the calls will be chained and applied in parallel on each partition of the DataFrame. You don’t need dask.delayed.

Could you be a bit more precise about what didn’t work? The best would be to have a reproducible example.

Note, I see you’re not using the axis=1 kwarg which is mandatory to use apply with Dask.

farmd · February 28, 2023, 1:40am

Hi @guillaumeeb,

Got it! Thank you!

Topic		Replies	Views
Perform the same operation on all columns of a dask dataframe in parallel Dask DataFrame delayed , distributed , dask-ml	5	216	November 10, 2022
Using DataFrame apply in a loop Dask DataFrame	2	1203	August 5, 2022
How to properly use Dask delayed on a function that calls other functions Deploying Dask delayed	11	387	August 13, 2023
How to parallelize a loop that applies the same model to different data	2	39	September 5, 2024
Seeking Feedback on Dask Implementation for Custom Function Application Dask DataFrame delayed	4	34	January 10, 2025

How to parallel process .apply with a lambda function within a for loop?

Related topics