Any technical papers on how dask works?

Hi,

I would like to follow up on my question from four years ago where I asked if a dask text book exists. I’m curious is there any technical papers since then on how dask works. I know technical papers is a niche medium as most communication takes place in blogs and and in talks. I’m looking for a single technical doc that gives an overview on what makes dask fast (Dask DataFrame is Fast Now — Coiled documentation probably fits this bill).

I recently enjoyed going through the Photon paper (https://people.eecs.berkeley.edu/~matei/papers/2022/sigmod_photon.pdf) and the DuckDB paper (https://mytherin.github.io/papers/2019-duckdbdemo.pdf). I’m looking from dask material as I think about what I want to include in a blog post i’m writing on choosing a dataframe library.

1 Like

There was a publication in the SciPy conference many years ago.

Really though the best source for this is the dask.distributed docs, which get pretty technical. distributed.dask.org

2 Likes

Found the paper http://conference.scipy.org.s3-website-us-east-1.amazonaws.com/proceedings/scipy2015/pdfs/matthew_rocklin.pdf Thanks

Agree on the docs as well. Dask has always had great docs

2 Likes