The Dask team released version 2022.04.2 last week, with many exciting improvements! It includes contributions from 31 people, including 8 new contributors.
Check out the complete changelog here, some release highlights:
- There are a lot of improvements to how Dask reads+writes parquet files, thanks to @rjzamora, @bryanweber, @ian, and @jcristharif! Note that this includes breaking changes to some default keyword arguments, learn more in the changelog.
- Thank you @rjzamora, for updating how pyarrow parquet engineโs pre_buffer option works with fsspec precache. A small change with a big performance boost!
- We have new documentation that explains some best practices for working with parquet files in Dask! Thank you, @jcristharif!
- This release includes many updates that make Dask Array more compatible with the Python Array API Standard. Thank you, Tom White and @jsignell for working on this!
- Speaking of compatibility, Dask DataFrame is also now more compatible with the upcoming pandas 1.5.0 release! Thanks, @ian!
- The team made a variety of stability improvements to dask/distributed in this release, including a fix for this major deadlock that caused workers to rapidly run out of memory.
- This release also brings mypy support to dask/dask! Thanks, @phobson and @crusaderky!
Finally, thank you @jrbourbeau for managing this release.