Dask Cluster on k8s - Managing Multiple Users Submitting Jobs Concurrently

Lolomgrofl · September 24, 2024, 3:07pm

Hi everyone.

We’ve deployed Dask cluster on top of k8s ( 5 nodes, each having 200GB of RAM and 50 vCPUs), using an example for this link. When a single developer is using the cluster, everything works perfectly. However, I’m thinking about a team of 3-5 people who will need to run jobs on the cluster simultaneously. My concern is how to efficiently manage multiple users submitting jobs at the same time. For instance, if Developer A submits a job that uses about 60% of the available RAM, and then Developer B submits a job that may require 45% of the RAM, this could lead to resource contention.

Is there a way to implement a queue or similar mechanism to check resource availability before submitting a job to the scheduler? If resources are insufficient, the job would wait in the queue until there’s enough capacity. Essentially, I’m looking for the best approach (the most Dasky approach if I can say like that) to handle day-to-day development using Dask on k8s. Any ideas or feedback would be greatly appreciated. Thanks!

Hvuj · September 25, 2024, 6:11am

we used to do this and switched to ephemeral clusters due to resource overload issues and making the scheduler and workers work to hard.

i dont recommend this- and this will require in general more work on k8s side than dask - and yes it can be done but doesnt worth the trouble.

you would need to implement logic using futures or async - await.

https://distributed.dask.org/en/latest/asynchronous.html

Lolomgrofl · September 25, 2024, 8:39pm

Thanks for the feedback, @Hvuj . Just to clarify, are you suggesting that by implementing futures or async and using a static cluster that everyone can connect to, we can address the issues of team members working on the same cluster and efficiently managing the memory for whatever they submit to the cluster?

Hvuj · September 27, 2024, 12:48pm

yes its possible - we did it but it created to much trouble.
its easier and cheaper to create ephemeral clusters with less resources and / or dynamic allocation of resources.

guillaumeeb · September 27, 2024, 1:54pm

As @Hvuj is saying, the most Dasky approach here is ephemeral clusters, one for each user. The resources priority or sharing would have to be handled by the resource orchestration system, Kubernetes here. You might want to limit every user, or to have an autoscaling approach. Do you have this K8S system on premise?

Lolomgrofl · September 27, 2024, 5:48pm

Hi @guillaumeeb

Yes, my k8s system is on-prem, so I have limited amount of resources.

guillaumeeb · October 2, 2024, 1:18pm

That will make things harder, but I’ll try to aim at ephemeral clusters anyway. That also means users have to clean things properly…

Lolomgrofl · October 2, 2024, 8:01pm

Thank you guys for the comments and suggestions! Ephemeral cluster will do the job for now.

guillaumeeb · October 4, 2024, 3:57pm

Please do not hesitate to get back to us and describe the K8S configuration you put in place for that!

Topic		Replies	Views
Dask on k8s - Best computing practices Deploying Dask kubernetes , distributed	5	156	July 25, 2024
Run dask in parallel doesn't work as expected, in distributed kubernetes pods Distributed	11	486	March 17, 2023
Dask distributed performance issues Distributed kubernetes , future , distributed	1	247	December 7, 2022
Only one worker out of seven carry out the workload Deploying Dask kubernetes	2	296	December 12, 2022
Batched Dask Worker Deployment on Kubernetes Deploying Dask	2	229	June 12, 2023

Dask Cluster on k8s - Managing Multiple Users Submitting Jobs Concurrently

Related topics