Recommended way to enforce method update after modification

apatlpo · May 22, 2025, 9:30am

I am wondering what is the recommended way to pass a modified method to map_partitions.
Working in jupyter with dask version 2025.4.1. Here is an illustrative example:

df = pd.DataFrame({'x': [1, 2, 3, 4, 5],
                   'y': [1., 2., 3., 4., 5.]})
ddf = dd.from_pandas(df, npartitions=2)

def myadd(df, a, b=1):
    return df.x + df.y + a + b
res = ddf.map_partitions(myadd, 1, b=2)
res.compute()

output:

0     5.0
1     7.0
2     9.0
3    11.0
4    13.0
dtype: float64

Updating myadd according to

def myadd(df, a, b=1):
    return df.x + df.y + a + b + 1 
res = ddf.map_partitions(myadd, 1, b=2)
res.compute()

outputs:

0     5.0
1     7.0
2     9.0
3    11.0
4    13.0
dtype: float64

So the update of myadd was not passed around and the result is the same.
Defining the method as static seems to work on the other hand:

@staticmethod
def myadd(df, a, b=1):
    return df.x + df.y + a + b + 1 
res = ddf.map_partitions(myadd, 1, b=2)
res.compute()

outputs:

0     6.0
1     8.0
2    10.0
3    12.0
4    14.0
dtype: float64

Is this the recommended way to do this?

I am fairly convinced older dask versions were more robust and provided the “correct” output without the static definition.
In any case, I did not find any mention of this in the dask documention nor on the web.
More visibility of this behavior would be welcomed to my opinion.

guillaumeeb · May 23, 2025, 1:52pm

Hi @apatlpo, welcome to Dask community!

Even if I’m not sure of the use case (you could also name your method something else), I agree this is probably not expected, and I didn’t find anything about that in the documentation.

I reproduced the issue with Dask 2025.3.0, but this was not the case with 2024.12.1 version. I imagine this comes from the recent dask-expr improvement.

So I concur with you, and I would recommend opening an issue with versions details about this behavior, to ask for either some clarifications or a correction of this.

apatlpo · May 23, 2025, 2:13pm

Hi, thanks for getting back to me, and nice to hear from you again

This is posted in the following dask issue: Recommended way to enforce method update after modification · Issue #11964 · dask/dask · GitHub

Topic		Replies	Views
Best practices for asserting data in dask.dataframe? Dask DataFrame	0	211	September 30, 2022
Why align_partitions() use force=True? Dask DataFrame	5	765	February 6, 2023
Best Practice for converting a function that takes multiple pandas dataframes into one that takes multiple dask dataframes? Dask DataFrame	1	157	January 25, 2023
Doubts related Dask dataframe Dask DataFrame	3	384	February 14, 2022
Delayed argument in map_partition Dask DataFrame delayed	4	49	May 5, 2025

Recommended way to enforce method update after modification

Related topics