avpaap
July 10, 2024, 8:19am
1
I have Deployed Dask Kubernetes Operator, DaskCluster and DaskAutoscaler. What I understand from documentation, the autoscaler polls scheduler for number of worker requirement and spawns the workers. I dont see the workers scaled and DaskCluster continues with the one worker in cluster as a long running job. When I see operator logs I dont see any messages related autoscaling when the Job is run.
This is a permanent running cluster, so client only submit the job to already running scheduler address and not create the cluster. Where can I look what is going on.
Hi @avpaap , welcome to Dask community!
Would you be able to share your configuration and the commands you used to create the cluster? You are talking about Helm cluster on the title, but dask-kubernetes in the post, what are you using?
cc @jacobtomlinson
Can you share the events from the DaskCluster and DaskAutoscaler objects?
avpaap
July 15, 2024, 7:06am
4
Here you go @jacobtomlinson
[dask-kubernetes]$ kubectl describe daskcluster dask -n $n
Name: dask
Namespace: dev-apps
Labels: app.kubernetes.io/instance=dask-cluster
Annotations: argocd.argoproj.io/tracking-id: dask-cluster:kubernetes.dask.org/DaskCluster:dev-apps/dask
kopf.zalando.org/last-handled-configuration:
{"spec":{"idleTimeout":0,"scheduler":{"service":{"ports":[{"name":"tcp-comm","port":8786,"protocol":"TCP","targetPort":"tcp-comm"},{"name"...
API Version: kubernetes.dask.org/v1
Kind: DaskCluster
Metadata:
Creation Timestamp: 2024-07-08T18:14:36Z
Generation: 1
Managed Fields:
API Version: kubernetes.dask.org/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:argocd.argoproj.io/tracking-id:
f:labels:
f:app.kubernetes.io/instance:
f:spec:
f:scheduler:
f:service:
f:ports:
k:{"port":8786,"protocol":"TCP"}:
.:
f:name:
f:port:
f:protocol:
f:targetPort:
k:{"port":8787,"protocol":"TCP"}:
.:
f:name:
f:port:
f:protocol:
f:targetPort:
f:selector:
f:dask.org/cluster-name:
f:dask.org/component:
f:type:
f:spec:
f:containers:
f:imagePullSecrets:
f:worker:
f:replicas:
f:spec:
f:containers:
f:imagePullSecrets:
Manager: argocd-controller
Operation: Apply
Time: 2024-07-08T18:14:36Z
API Version: kubernetes.dask.org/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:kopf.zalando.org/last-handled-configuration:
Manager: kopf
Operation: Update
Time: 2024-07-08T18:14:36Z
Resource Version: 2759291914
UID:
Spec:
Idle Timeout: 0
Scheduler:
Service:
Ports:
Name: tcp-comm
Port: 8786
Protocol: TCP
Target Port: tcp-comm
Name: http-dashboard
Port: 8787
Protocol: TCP
Target Port: http-dashboard
Selector:
dask.org/cluster-name: dask
dask.org/component: scheduler
Type: ClusterIP
Spec:
Containers:
Args:
dask-scheduler
Image: dask:1.1
Image Pull Policy: IfNotPresent
Liveness Probe:
Http Get:
Path: /health
Port: http-dashboard
Initial Delay Seconds: 15
Period Seconds: 20
Name: scheduler
Ports:
Container Port: 8786
Name: tcp-comm
Protocol: TCP
Container Port: 8787
Name: http-dashboard
Protocol: TCP
Readiness Probe:
Http Get:
Path: /health
Port: http-dashboard
Initial Delay Seconds: 5
Period Seconds: 10
Resources:
Limits:
Cpu: 2
Memory: 4G
Image Pull Secrets:
Name: image-pull-secret
Worker:
Replicas: 1
Spec:
Containers:
Args:
dask-worker
--name
$(DASK_WORKER_NAME)
--dashboard
--dashboard-address
8788
Image: dask:1.1
Image Pull Policy: IfNotPresent
Name: worker
Ports:
Container Port: 8788
Name: http-dashboard
Protocol: TCP
Resources:
Limits:
Cpu: 1
Memory: 2G
Image Pull Secrets:
Name: image-pull-secret
Events: <none>
[dask-kubernetes]$
[dask-kubernetes]$ kubectl describe daskautoscaler dask -n $n
Name: dask
Namespace: dev-apps
Labels: app.kubernetes.io/instance=dask-autosclaer
Annotations: argocd.argoproj.io/tracking-id: dask-autosclaer:kubernetes.dask.org/DaskAutoscaler:dev-apps/dask
API Version: kubernetes.dask.org/v1
Kind: DaskAutoscaler
Metadata:
Creation Timestamp: 2024-07-08T12:08:16Z
Generation: 3
Managed Fields:
API Version: kubernetes.dask.org/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:argocd.argoproj.io/tracking-id:
f:labels:
f:app.kubernetes.io/instance:
f:spec:
f:cluster:
f:maximum:
Manager: argocd-controller
Operation: Apply
Time: 2024-07-10T08:37:44Z
API Version: kubernetes.dask.org/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
f:minimum:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2024-07-10T12:46:33Z
Resource Version: 2779447296
UID:
Spec:
Cluster: dask
Maximum: 25
Minimum: 3
Events: <none>
[dask-kubernetes]$