Distributed runner for Beam

Folks in the Pangeo community have been experimenting with the Apache Beam programming model for distributed computing. There is growing interested in implementing a Dask Distributed Runner for Beam (e.g. Support for a Dask runner · Issue #18962 · apache/beam · GitHub). This would make it a lot easier for people with existing Dask infrastructure to use Beam. Beam can already run on many different distributed computing backends (Spark, Flink, Dataflow, etc.) There is an extensive Beam Runner Authoring Guide (Google it, Discourse won’t let me post more than 2 links). So implementing a Dask runner is a tightly scoped and completely feasible engineering task.

In order to kickstart the discussion of implementing a Dask Beam runner, I propose we meet during the week of June 13-17. I have created a When2Meet Poll here - Dask Beam Runner Discussion - When2meet . If you are interested in attending, please give your availability. Hope to see you at the meeting!

1 Like

It would be great if we could convince a Dask maintainer to attend this meeting. Optimistically tagging @gjoseph92 and @jrbourbeau.

Thanks to all who replied! We have scheduled the call for Wed June 15, 1:30 pm ET. The zoom link is Launch Meeting - Zoom

Looking forward to the discussion!

1 Like