Dask .to_parquet() errors when saving lists of integers (object types) with convert-string: False

ave4000 · January 25, 2024, 1:41pm

Hi,

The Related section helped me find a solution.

In case somebody struggles, this one works just fine:

import dask
import dask.dataframe as dd
import pandas as pd
dask.config.set({"dataframe.convert-string": False})

import pandas as pd
a = pd.DataFrame({"int_array_column1":[[1,2,3]],"int_array_column2":[[1,2,3]]})
ddf = dd.from_pandas(a,npartitions=1)

ddf_dd = ddf.to_parquet("testdd.parquet", schema={
    "int_array_column1": pa.list_(pa.int64()),
    "int_array_column2": pa.list_(pa.int64()),
    })

Please feel free to close the thread, apologies for the mess, hopefully somebody can use this.

Topic		Replies	Views
How to write and read DataFrame with vector column (e.g. list(float64))? Dask DataFrame	2	1048	September 4, 2023
How to upload dataframe with numpy array column using to_parquet in dask.dataframe? Dask DataFrame	2	807	August 29, 2023
Still cannot get rid of string conversion for blob Dask DataFrame	3	69	August 30, 2024
Error when creating pyarrow schema from dask dataframe Dask DataFrame parquet , pyarrow	2	1737	June 1, 2023
DDF is converting column of lists/dicts to strings Dask DataFrame	2	1024	January 18, 2024

Dask .to_parquet() errors when saving lists of integers (object types) with convert-string: False

Related topics