Is it necessary to call compute() before calling to_parquet()?

amp123 · March 29, 2024, 10:22am

So calling to_parquet without calling compute() is writing to a number of parquet files within a folder, whereas after calling compute() it is writing to a single file (presumably because it is saving a Pandas dataframe in that case and not a Dask one).

Apologies - I didn’t notice the folders that were being created, I was only looking at the files.

Topic		Replies	Views
Memory issues arising from writing partitions with to_parquet	5	761	September 18, 2023
Improving pipeline resilience when using `to_parquet` and preemptible workers Dask DataFrame distributed	5	445	August 25, 2023
How to handle a Dask DF in multiple modules? Dask DataFrame	6	574	February 8, 2023
Read Parquet with Varying Schemas Dask DataFrame	4	722	February 7, 2024
How to save the database so that it is readable for the dataframe Dask DataFrame	2	398	April 14, 2022

Is it necessary to call compute() before calling to_parquet()?

Related topics