Unable to get head of a CSV read dask dataframe


I forward here a message posted on stackoverflow

Here is the failing code

import dask.dataframe as dd
from dask.distributed import Client, LocalCluster
import pandas as pd

local_file = 'example.csv'
df0 = pd.DataFrame({'id':[0,1,3], 'model':['A', 'B', 'C']})

if __name__ == '__main__':
    with LocalCluster(processes=False) as cluster, Client(cluster) as client: 
        df = dd.read_csv(local_file)
        print('df :')

Inside the local cluster/client, the df.compute doesn’t return pandas dataframe with values inside, but rather a “Serialize” graph.
And then df.head returns error. Which is not the case outer of the client.

Doesn’t anyone can fix this seeming bug ?

Hi @Francois,

I saw your Stackoverflow question, but did not have time to really look at it. I reproduced the issue.

It seems to me that there is a problem with the LocalCluster or Client context manager, because just doing:

client = Client()
df = dd.read_csv(local_file)
print('df :')

works as expected.

I think you should open an issue in distributed Github repo.

Thanks for your test and your advice. I’ll do as you suggest