Benchmarking Bodo and Dask

Scott_Routledge · February 14, 2025, 7:50pm

We recently ran a benchmark comparing Bodo and Dask for distributed data processing in Python. In this particular test, Bodo showed significant speedup (~50x) over Dask, and we believe this is due to our auto-parallelizing compiler and MPI-based backend. We encourage others to reproduce our results and let us know how we can improve our benchmarks to make them more informative, useful, etc. Check out the full write-up here, and our GitHub repo here.

Scott_Routledge · February 14, 2025, 7:50pm

There is also a discussion about this benchmark in Github Issues here which provides additional context that is useful for understanding the results.

guillaumeeb · February 14, 2025, 8:21pm

Hi @Scott_Routledge, welcome to Dask Discourse forum,

I see you already found the discussion between some of the Dask maintainers, but thanks for sharing this benchmark here.

cc @jacobtomlinson @fjetter @martindurant @Patrick

Topic		Replies	Views
Comparisons of dask and bend Distributed distributed	2	105	May 22, 2024
Delayed functions with Dask - Worse performance delayed	1	299	July 13, 2023
Dask suitability for my use case	1	65	February 16, 2024
Need help with efficient parallelization [local machine] Distributed delayed , distributed	2	255	July 30, 2022
Help with Dask Xgboost performance	2	399	February 28, 2023

Benchmarking Bodo and Dask

Related topics