How to append to a Dask Dataframe

guillaumeeb · March 14, 2023, 10:54am

As said above, if your input DataFrame doesn’t fit into memory, I don’t see what you can do except using persist before the for loop if the DataFrame fits in the distributed cluster memory. This would avoid to read back the data for every drop_duplicates call!

The thing is drop_duplicates is already running in parallel, so it would a bit difficult and dangerous to also try to run the for loop in parallel.

Topic		Replies	Views
How to handle a Dask DF in multiple modules? Dask DataFrame	6	604	February 8, 2023
Creating a new dask df using columns from 2 dataframes and keeping the index of the first Dask DataFrame dask-array , merge	15	181	July 31, 2024
Gradually build up a Dataframe Dask DataFrame	2	1178	July 14, 2022
How to drop duplicates by string id for a large dataframe? Dask DataFrame	3	1234	May 31, 2023
Why dask runs with no results? Dask DataFrame	6	376	June 30, 2023

How to append to a Dask Dataframe

Related topics