Dask on k8s - Best computing practices

Hi everyone,

I am planning to set up a Kubernetes cluster with 4 nodes (4 VMs) and deploy Dask on top of it. I am curious about the best practices for resource management at the pod (worker) level. Specifically, I want to determine the ideal resource allocation for each pod in terms of RAM, vCPU, and storage.

This information will help me understand how much total RAM, vCPU, and storage I need to allocate to ensure a stable and efficient deployment of the cluster. I will be processing large datasets (around 30 million rows) and training machine learning models. Given the limited amount of memory available from my virtualization provider, I am looking for guidance on the minimum resource requirements for Dask workers. My initial thought was about having a 4 nodes with around 48-64GB RAM, 10-12vCPUs and ~150-200GB storage each, but I was wandering on a pod level, how much can I scale and what is the minimal requirement of a pod.

I understand this is a use-case specific question, but I was wondering if there are any standard setups that have proven to be effective in general? Any advice or best practices would be greatly appreciated.

Hi @Lolomgrofl, welcome to Dask Discourse!

Well, this question is not really specific to Kubernetes, much more to your use case and workflow.

That said, there are a few answers available:

Regarding your infrastructure, You’ve got about 5GiB per core, which sound quite normal and adapted to a computing cluster. So there are two big questions: is your workload affected by the GIL, and do you need more than 4/5GiB per process to compute your dataset.

Within Kuernetes, it’s often classical to have only one Worker process per pod, generally with 1 or 2 threads. You could start with pods with 1 process and 2 threads, so requesting 2 vCPU, and about 10 GiB memory. This will give you 30-38 Workers which is a small Dask cluster, but probably enough for what you need. You can also try with 1 process and 1 threads, doubling the number of Workers.

Small pods are easier for Kubernetes to fit on various nodes sizes.

2 Likes

Hi @guillaumeeb,

Thanks a lot for the very detailed explanations and examples, you hit the :dart: .

This was really really helpful.

Cheers!

1 Like

Hey @guillaumeeb ,

I have a small follow-up on this topic. We have a team of 8 people who are doing different tasks: data preprocessing, model training, and standard ML work optimizations. We typically perform daily tasks using notebooks for research and prototyping, but we want to transition this work to a more robust framework for production.

Our goal is to set up a Dask cluster that everyone on the team can easly connect to and use from their individual machines. Additionally, we aim to establish an automated pipeline that runs on a regular schedule, such as once a week, incorporating MLOps best practices.

What would be the best approach to deploy and use Dask on k8s ( out of all possible dask on k8s deployment options) in this specific use-case?

Thanks!

I generally recommend that you treat Dask clusters as lightweight and ephemeral. They are very cheap to start up and shut down, especially on Kubernetes.

I would advise against sharing a single cluster between many users in the way you described as the Dask scheduler doesn’t have any intelligence around having multiple clients connected to it. As your users submit work they will be run on a first come first served basis, and it is very possible for a single person to use up all the memory on the cluster and cause other jobs to fail. Sharing a cluster in this way is fine for development, but I wouldn’t recommend it in production.

My recommendation today is to use dask-kubernetes. Once you install the operator on your Kubernetes cluster you can create Dask clusters very easily either in Python or with kubectl.

Another option is dask-gateway but I tend to only recommend it if you care a lot about abstracting Kubernetes away from your users and not giving them access to Kubernetes via kubectl. The project is less well maintained than dask-kubernetes and has fewer features.

2 Likes

Thanks @jacobtomlinson :muscle:t3: This is super helpful!

1 Like