Deploying Dask on k8s without CRD's

dharhas · September 5, 2024, 3:01pm

Is there a way to deploy dask on an existing k8s cluster where we only have access to a namespace but no ability to create CRD’s. Both dask-gateway and dask-kubernetes helm charts looks like they require the ability to create CRD’s which we have confirmed that we do not have.

If there isn’t an existing way, what kind of effort level would it be to workaround this constraint.

https://kubernetes.dask.org/en/latest/installing.html#single-namespace

jacobtomlinson · September 5, 2024, 4:20pm

Generally folks in very constrained Kubernetes environments tend to use the helm chart. Installing this chart creates a single dask cluster as a deployment. You don’t get features like autoscaling, but you don’t need an operator or CRDs.

I’d love to hear more about the use case and what your users need.

dharhas · September 5, 2024, 9:11pm

We are talking to a department within a larger enterprise organization. The organization runs an on-prem Openshift Kubernetes cluster. The department has a provisioned k8s namespace and no ability to change anything outside of that.

The department wants to use dask but needs to get buy in from leadership.

We are needing to do a small POC to run a Dask on k8s vs Spark on k8s on their on-prem cluster by whatever means necessary to get leadership buy in and show that Dask performance is good enough for their target use cases.

After that there will be a larger project to make everything work properly. They are essentially building an internal platform that will allow data analysts/data scientists write code and push to run via dask. I think currently they are prototyping using dask-delayed a lot. They will probably need some of the management features dask-gateway has. We might be able to get funding to improve dask-kubernetes or dask-gateway through the larger project but the no-crd thing seems like it might be a blocker there.

Autoscaling was one of the things they were excited about so it is unfortunate that that won’t work.

dharhas · September 5, 2024, 9:13pm

Is there a path to getting autoscaling and cluster management inside a single k8s namespace without CRDs given funding?

jacobtomlinson · September 9, 2024, 9:11am

I would encourage pushing back on the no CRDs thing. Most tools are using CRDs these days, it’s a very standard Kubernetes pattern. I’d ask more about how they intend to run Spark on Kubernetes as their operator uses CRDs, as does the Ray Operator, as does Kubeflow, etc.

But if there is no movement at all and they are happy to fund feature development and ongoing maintenance of those features then we could explore adding an option to store state in an external database instead of Kubernetes CRDs.

Topic		Replies	Views
Dask-kubernetes cluster (custom resource) in Pending Deploying Dask	4	36	November 19, 2024
Deploying Dask on an rke2 custom cluster Deploying Dask	8	191	April 4, 2024
Deploying Dask Gateway without Traefik Deploying Dask dask-gateway , kubernetes	1	32	October 11, 2024
Dask-gateway fails to create resource DaskCluster CRD on kubernetes Deploying Dask dask-gateway , kubernetes	3	178	October 5, 2023
Dask-Kubernetes Usage Deploying Dask kubernetes	7	838	March 14, 2023

Deploying Dask on k8s without CRD's

Related topics