Dask on k8s - Best computing practices

I generally recommend that you treat Dask clusters as lightweight and ephemeral. They are very cheap to start up and shut down, especially on Kubernetes.

I would advise against sharing a single cluster between many users in the way you described as the Dask scheduler doesn’t have any intelligence around having multiple clients connected to it. As your users submit work they will be run on a first come first served basis, and it is very possible for a single person to use up all the memory on the cluster and cause other jobs to fail. Sharing a cluster in this way is fine for development, but I wouldn’t recommend it in production.

My recommendation today is to use dask-kubernetes. Once you install the operator on your Kubernetes cluster you can create Dask clusters very easily either in Python or with kubectl.

Another option is dask-gateway but I tend to only recommend it if you care a lot about abstracting Kubernetes away from your users and not giving them access to Kubernetes via kubectl. The project is less well maintained than dask-kubernetes and has fewer features.

2 Likes