How to upload dataframe with numpy array column using to_parquet in dask.dataframe?

Hi @hjlee9182, welcome to Dask Discourse forum!

As indicated here for schema kwarg:

Global schema to use for the output dataset. Defaults to “infer”, which will infer the schema from the dask dataframe metadata. This is usually sufficient for common schemas, but notably will fail for object dtype columns that contain things other than strings. These columns will require an explicit schema be specified.

So you need to specify a schema in to_parquet. I’m no pyarrow expert, but I’ve been able to make it work with:

df.to_parquet('/tmp/arrayparquet', engine='pyarrow', schema={"float_array_column": pa.list_(pa.float64())})