Dask on k8s - Best computing practices

guillaumeeb · July 18, 2024, 7:35pm

Hi @Lolomgrofl, welcome to Dask Discourse!

Well, this question is not really specific to Kubernetes, much more to your use case and workflow.

That said, there are a few answers available:

Regarding your infrastructure, You’ve got about 5GiB per core, which sound quite normal and adapted to a computing cluster. So there are two big questions: is your workload affected by the GIL, and do you need more than 4/5GiB per process to compute your dataset.

Within Kuernetes, it’s often classical to have only one Worker process per pod, generally with 1 or 2 threads. You could start with pods with 1 process and 2 threads, so requesting 2 vCPU, and about 10 GiB memory. This will give you 30-38 Workers which is a small Dask cluster, but probably enough for what you need. You can also try with 1 process and 1 threads, doubling the number of Workers.

Small pods are easier for Kubernetes to fit on various nodes sizes.

Topic		Replies	Views
Dask Cluster on k8s - Managing Multiple Users Submitting Jobs Concurrently Distributed kubernetes , distributed	8	44	October 4, 2024
Run dask in parallel doesn't work as expected, in distributed kubernetes pods Distributed	11	484	March 17, 2023
Batched Dask Worker Deployment on Kubernetes Deploying Dask	2	229	June 12, 2023
Only one worker out of seven carry out the workload Deploying Dask kubernetes	2	296	December 12, 2022
Dask distributed performance issues Distributed kubernetes , future , distributed	1	246	December 7, 2022

Dask on k8s - Best computing practices

Related topics