Is there a way to deploy dask on an existing k8s cluster where we only have access to a namespace but no ability to create CRD’s. Both dask-gateway and dask-kubernetes helm charts looks like they require the ability to create CRD’s which we have confirmed that we do not have.
If there isn’t an existing way, what kind of effort level would it be to workaround this constraint.
Generally folks in very constrained Kubernetes environments tend to use the helm chart. Installing this chart creates a single dask cluster as a deployment. You don’t get features like autoscaling, but you don’t need an operator or CRDs.
I’d love to hear more about the use case and what your users need.
We are talking to a department within a larger enterprise organization. The organization runs an on-prem Openshift Kubernetes cluster. The department has a provisioned k8s namespace and no ability to change anything outside of that.
The department wants to use dask but needs to get buy in from leadership.
We are needing to do a small POC to run a Dask on k8s vs Spark on k8s on their on-prem cluster by whatever means necessary to get leadership buy in and show that Dask performance is good enough for their target use cases.
After that there will be a larger project to make everything work properly. They are essentially building an internal platform that will allow data analysts/data scientists write code and push to run via dask. I think currently they are prototyping using dask-delayed a lot. They will probably need some of the management features dask-gateway has. We might be able to get funding to improve dask-kubernetes or dask-gateway through the larger project but the no-crd thing seems like it might be a blocker there.
Autoscaling was one of the things they were excited about so it is unfortunate that that won’t work.
I would encourage pushing back on the no CRDs thing. Most tools are using CRDs these days, it’s a very standard Kubernetes pattern. I’d ask more about how they intend to run Spark on Kubernetes as their operator uses CRDs, as does the Ray Operator, as does Kubeflow, etc.
But if there is no movement at all and they are happy to fund feature development and ongoing maintenance of those features then we could explore adding an option to store state in an external database instead of Kubernetes CRDs.