I want to upload dataframe with array column!
Under code rise type error.
Any suggestions on how to further troubleshoot, how to fix this, or where else I could ask for help would be greatly appreciated!
import numpy as np
import pandas as pd
from dask import dataframe as dd. ## dask version == 2023.5.0
data = {
'float_array_column': [
np.array([1.1, 2.2, 3.3]),
np.array([4.4, 5.5]),
np.array([6.6, 7.7, 8.8, 9.9])
]
}
path = {storage_path}
storage_option = {storage_option}
df = pd.DataFrame(data)
df = dd.from_pandas(df, chunksize=5368709)
dd.to_parquet(df, path, engine='pyarrow', storage_options=storage_option)
Change data numpy to list is also got errored.
data = {
'float_array_column': [
[1.1, 2.2, 3.3],
[4.4, 5.5],
[6.6, 7.7, 8.8, 9.9]
]
}
ValueError: Failed to convert partition to expected pyarrow schema:
`ArrowTypeError("Expected bytes, got a 'list' object", 'Conversion failed for column float_array_column with type object')`
Expected partition schema:
float_array_column: string
__null_dask_index__: int64
Received partition schema:
float_array_column: list<item: double>
child 0, item: double
__null_dask_index__: int64
This error *may* be resolved by passing in schema information for
the mismatched column(s) using the `schema` keyword in `to_parquet`.
When using dd.from_array is also got errored.
data = {
'float_array_column': [
dd.from_array(np.array([1.1, 2.2, 3.3])),
dd.from_array(np.array([4.4, 5.5])),
dd.from_array(np.array([6.6, 7.7, 8.8, 9.9]))
]
}
ArrowTypeError: ("Expected bytes, got a 'Series' object", 'Conversion failed for column float_array_column with type object')
During handling of the above exception, another exception occurred: