Taskqueue library written entirely on Dask

Hello everybody,

I have been working on a distributed task queue library entirely written on top of Dask. I hit a performance wall sometime back when trying to schedule a simple embarrassingly parallel workflow on Dask. The dask scheduler is an amazing piece of software but has its limits when trying to schedule millions of simple tasks.
daskqueue is a python library built on top of Dask and Dask Distributed that implements a very lightweight Distributed Task queue and distributed dask workers.

Think of this library as a simpler version of Celery built entirely on Dask. Running Celery requires installing a one or multiple backend which can be tricky in HPC environnment (for instance) is usually very tricky whereas spawning a Dask Cluster is a lot easier to manage, debug and cleanup.

The library is still rough but is stable enough to do some testing with. I would love to get some feedback on this work and probe your interest on such library. Any contribution is more than welcome, I am planning on building some cool features like a consensus algorithm for consistent state etc… :smile_cat:

Link to github : https://github.com/AmineDiro/daskqueue
Pypi package : daskqueue · PyPI

2 Likes

Really cool, thanks for sharing!