This release includes contributions from 28 people, including 7 new contributors! You can see the complete list of change in the Changelog. Highlights include:
-
Dask Bag now supports out-of-memory sampling using reservoir sampling, thanks to Daniel Mesejo-León
-
There have been a number of improvements to documentation. There is a new new page documenting the interactive Dask dashboard thanks to @ncclementi
-
Other doc improvements include better API docs of string, categorical, and datetime accessors for DataFrames, and updated explanations of how Dask’s distributed scheduler assigns tasks to workers. Thanks to @scharlottej13, @jcristharif, keewis, and @gjoseph92!
-
The Dask distributed scheduler can now dump its cluster state to arbitrary fsspec-compatible filesystems (like S3 or GCS), allowing for easier debugging.
-
In Dask DataFrame, you can now pass a Dask Index to set_index(), thanks to @phobson
-
Finally, Dask has deprecated reading bcolz tables, as the Blosc Development Team is no longer able to maintain the bcolz package, thanks to @pavithraes for adding the deprecation warning
Thanks to @jsignell for managing this release, and to all Dask contributors who had a hand in it!