Importing dask.dataframe broke pandas code

Hi there!

I founded a very strange problem and prepared a reproducer for it:

import pandas
import numpy as np
import dask.dataframe as dd
from distributed import Client

size = 10
TEST_DATA = {i: [i * size + j for j in range(size)] for i in range(size)}

def dask_pipeline():
    with Client() as client:
        df = pandas.DataFrame(TEST_DATA)
        df[1][:4] = np.nan
        df[3][-4:] = np.nan

        assert np.isnan(df[1][3])
        assert np.isnan(df[3][7])

        # dask_df = dd.from_pandas(df)
        df.fillna(value=0, inplace=True)

        df[1][:4] = np.nan
        df[3][-4:] = np.nan

        assert np.isnan(df[1][3])
        assert np.isnan(df[3][7])


if __name__ == "__main__":
    dask_pipeline()

If I run this code without importing dask.dataframe, it will complete successfully. If I try to import this, the behavior of the pandas Dataframe will change unexpectedly.

Please tell me, is it a bug or a very strange feature?

Hi @KSuvorov,

What difference do you see in the result with or without importing dask.dataframe? I just tried the code and did not see anything special. Do you change anything else than importing dask.dataframe?

Hi @guillaumeeb,

Thanks for your fast reply.

The second assertion block is invalid and throws an exception.

The AssertionError

Traceback (most recent call last):
File “/localdisk/ksuvorov/git/modin/test.py”, line 31, in
dask_pipeline()
File “/localdisk/ksuvorov/git/modin/test.py”, line 16, in dask_pipeline
assert np.isnan(df[1][3])
AssertionError

I use Python=3.9.18 and my environment includes the following libraries:

# Name                    Version                   Build    Channel
dask                      2024.3.0           pyhd8ed1ab_1    conda-forge
dask-core                 2024.3.0           pyhd8ed1ab_0    conda-forge
dask-expr                 1.0.1              pyhd8ed1ab_0    conda-forge
distributed               2024.3.0           pyhd8ed1ab_0    conda-forge
pandas                    2.2.1            py39hddac248_0    conda-forge

Okay, I can reproduce your problem with your environment. This is probably due to importing dask.dataframe changes pandas behaviour in 2024.3.0 · Issue #10996 · dask/dask · GitHub.

This has been fixed in the next dask-expr release, but be aware as stated in the above issue comments that you will encounter this problem again with Pandas 3.0. It already triggers a lot of warning messages!

Thanks, it resolves my problem!