(a) I find that when writing dask dataframe as parquet files, the divisions are lost. How to overcome this? I am using the default engine (i.e., pyarrow).
(b) If the dataframe already has index, but the divisions are absent, how to assign divisions? I can do the following to assign divisions, but the following code assumes that the index column is yet to be assigned.
I find that when writing dask dataframe as parquet files, the divisions are lost.
Would you be able to share a minimal example? It’ll allow me to help you better.
A few notes:
You can set calculate_divisions=True in read_parquetto get the divisions while reading your data back (this will work only if the global metadata file exists)
If the metadata file isn’t written, you can set write_metadata_file=True in to_parquet
Relevant docs which have some additional information+notes: