DDF is converting column of lists/dicts to strings

I was trying to use apply a function that return a list of dicts and found that the returned column turned into strings. When I was trying to replicate this, I had a similar observation while just converting from pandas df to ddfs.

import dask.dataframe as dd #tried moving this to line 4 as well
import dask
dask.config.set({"dask.dataframe.convert-string": False})

import pandas as pd
a = pd.DataFrame({"a":[1,2,3],"b":[{1:2},{2:2},{3:3}]})
> a
   a       b
0  1  {1: 2}
1  2  {2: 2}
2  3  {3: 3}
> dd.from_pandas(a,npartitions=1)
                   a       b
npartitions=1               
0              int64  string
2                ...     ...
Dask Name: to_pyarrow_string, 2 graph layers
> a.dtypes
a     int64
b    object
dtype: object
> dd.from_pandas(a,npartitions=1).compute().dtypes
a              int64
b    string[pyarrow]
dtype: object

pd.version : ‘2.1.4’
dask.version : ‘2024.1.0’

Am I missing something?

Hi @pramodhrachuri, welcome to Dask discourse!

The problem only comes from the configuration key you are using! If you want to disable Dask automatically converting object to PyArrow string, the correct code is this one:

dask.config.set({"dataframe.convert-string": False})

So without dask. at the beginning.

Thank you so much! Such a small silly mistake lol. I have copied the line from DataFrame.apply(meta=(None, 'object')) converts to pyarrow's string · Issue #10628 · dask/dask · GitHub.

I was just getting started with dask and find it extremely useful for large datasets. It would be great if dask is reports that there is no such thing as “dask.dataframe.convert-string”.