So, we have implemented a layer on top of the classic KubeCluster to allow running an heterogenous adaptive scheduler, with several different KubeClusters.
One of the cool thing it supports is scheduling jobs in AWS and on prem (or on another K8s cluster) within the same dask scheduler (let’s say easy and flexible access to fancy GPUs or to large RAM, or …)
It seems that the new operator API will be a nice way to support multiple groups, and maybe with a bit of customisation, adaptive scaling as well, but I’m not sure of what the best way would be to handle cross-cluster scheduling.
Hi @champialex, welcome to this forum!
So if I understand correctly, you have at least two separated Kubernetes clusters, and you developed some layer to address both cluster with a single Dask Scheduler.
I don’t think the new operator API can support multiple clusters currently. I think we will need @jacobtomlinson advice here.
Yup, that’s correct. And with adaptive scheduling, to only pay for EC2 hardware when needed (and for good resource utilisation in general).
It does seem a bit hard to get that to work with the operator.
Sorry for the delay in responding here. This is not a use case I had considered before.
Most folks create/destroy KubeCluster instances as and when they need them. I’m curious what the benefit of spanning one Dask cluster over multiple Kubernetes clusters is vs having many Dask clusters each in a single Kubernetes cluster?