Shuffling large data at constant memory in Dask

Summary

The dask release 2023.2.1 , introduced a new shuffling method called P2P for dask.dataframe, making sorts, merges, and joins faster and using constant memory. This article describes the problem, the new solution, and the impact on performance

Link

2 Likes