OutOfMemory, when merging multiple dataframes! Help me optimize!

Hi @devLeitner, welcome to Dask discourse!

Could you elaborate about that, or provide an example of dataset? I’m not sure I understand, but if you have a lot of rows with the same merging id, you might have a problem:

In some cases, you may see a MemoryError if the merge operation requires an internal shuffle, because shuffling places all rows that have the same index in the same partition. To avoid this error, make sure all rows with the same on-column value can fit on a single partition.

See the discussion here too: Memory Leakage on single worker on merged DataFrame (after task completion).