HelmCluster + Autoscaling doesnt seem to scale the workers

I have Deployed Dask Kubernetes Operator, DaskCluster and DaskAutoscaler. What I understand from documentation, the autoscaler polls scheduler for number of worker requirement and spawns the workers. I dont see the workers scaled and DaskCluster continues with the one worker in cluster as a long running job. When I see operator logs I dont see any messages related autoscaling when the Job is run.

This is a permanent running cluster, so client only submit the job to already running scheduler address and not create the cluster. Where can I look what is going on.

Hi @avpaap, welcome to Dask community!

Would you be able to share your configuration and the commands you used to create the cluster? You are talking about Helm cluster on the title, but dask-kubernetes in the post, what are you using?

cc @jacobtomlinson

Can you share the events from the DaskCluster and DaskAutoscaler objects?

Here you go @jacobtomlinson

[dask-kubernetes]$ kubectl describe daskcluster dask -n $n
Name:         dask
Namespace:    dev-apps
Labels:       app.kubernetes.io/instance=dask-cluster
Annotations:  argocd.argoproj.io/tracking-id: dask-cluster:kubernetes.dask.org/DaskCluster:dev-apps/dask
              kopf.zalando.org/last-handled-configuration:
                {"spec":{"idleTimeout":0,"scheduler":{"service":{"ports":[{"name":"tcp-comm","port":8786,"protocol":"TCP","targetPort":"tcp-comm"},{"name"...
API Version:  kubernetes.dask.org/v1
Kind:         DaskCluster
Metadata:
  Creation Timestamp:  2024-07-08T18:14:36Z
  Generation:          1
  Managed Fields:
    API Version:  kubernetes.dask.org/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:argocd.argoproj.io/tracking-id:
        f:labels:
          f:app.kubernetes.io/instance:
      f:spec:
        f:scheduler:
          f:service:
            f:ports:
              k:{"port":8786,"protocol":"TCP"}:
                .:
                f:name:
                f:port:
                f:protocol:
                f:targetPort:
              k:{"port":8787,"protocol":"TCP"}:
                .:
                f:name:
                f:port:
                f:protocol:
                f:targetPort:
            f:selector:
              f:dask.org/cluster-name:
              f:dask.org/component:
            f:type:
          f:spec:
            f:containers:
            f:imagePullSecrets:
        f:worker:
          f:replicas:
          f:spec:
            f:containers:
            f:imagePullSecrets:
    Manager:      argocd-controller
    Operation:    Apply
    Time:         2024-07-08T18:14:36Z
    API Version:  kubernetes.dask.org/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:kopf.zalando.org/last-handled-configuration:
    Manager:         kopf
    Operation:       Update
    Time:            2024-07-08T18:14:36Z
  Resource Version:  2759291914
  UID:               
Spec:
  Idle Timeout:  0
  Scheduler:
    Service:
      Ports:
        Name:         tcp-comm
        Port:         8786
        Protocol:     TCP
        Target Port:  tcp-comm
        Name:         http-dashboard
        Port:         8787
        Protocol:     TCP
        Target Port:  http-dashboard
      Selector:
        dask.org/cluster-name:  dask
        dask.org/component:     scheduler
      Type:                     ClusterIP
    Spec:
      Containers:
        Args:
          dask-scheduler
        Image:              dask:1.1
        Image Pull Policy:  IfNotPresent
        Liveness Probe:
          Http Get:
            Path:                 /health
            Port:                 http-dashboard
          Initial Delay Seconds:  15
          Period Seconds:         20
        Name:                     scheduler
        Ports:
          Container Port:  8786
          Name:            tcp-comm
          Protocol:        TCP
          Container Port:  8787
          Name:            http-dashboard
          Protocol:        TCP
        Readiness Probe:
          Http Get:
            Path:                 /health
            Port:                 http-dashboard
          Initial Delay Seconds:  5
          Period Seconds:         10
        Resources:
          Limits:
            Cpu:     2
            Memory:  4G
      Image Pull Secrets:
        Name:  image-pull-secret
  Worker:
    Replicas:  1
    Spec:
      Containers:
        Args:
          dask-worker
          --name
          $(DASK_WORKER_NAME)
          --dashboard
          --dashboard-address
          8788
        Image:              dask:1.1
        Image Pull Policy:  IfNotPresent
        Name:               worker
        Ports:
          Container Port:  8788
          Name:            http-dashboard
          Protocol:        TCP
        Resources:
          Limits:
            Cpu:     1
            Memory:  2G
      Image Pull Secrets:
        Name:  image-pull-secret
Events:        <none>
[dask-kubernetes]$ 



[dask-kubernetes]$ kubectl describe daskautoscaler dask -n $n
Name:         dask
Namespace:    dev-apps
Labels:       app.kubernetes.io/instance=dask-autosclaer
Annotations:  argocd.argoproj.io/tracking-id: dask-autosclaer:kubernetes.dask.org/DaskAutoscaler:dev-apps/dask
API Version:  kubernetes.dask.org/v1
Kind:         DaskAutoscaler
Metadata:
  Creation Timestamp:  2024-07-08T12:08:16Z
  Generation:          3
  Managed Fields:
    API Version:  kubernetes.dask.org/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:argocd.argoproj.io/tracking-id:
        f:labels:
          f:app.kubernetes.io/instance:
      f:spec:
        f:cluster:
        f:maximum:
    Manager:      argocd-controller
    Operation:    Apply
    Time:         2024-07-10T08:37:44Z
    API Version:  kubernetes.dask.org/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        f:minimum:
    Manager:         kubectl-client-side-apply
    Operation:       Update
    Time:            2024-07-10T12:46:33Z
  Resource Version:  2779447296
  UID:               
Spec:
  Cluster:  dask
  Maximum:  25
  Minimum:  3
Events:     <none>
[dask-kubernetes]$