dask_kubernetes.operator.KubeCluster hard codes scheduler address

Hi - we’re trying to set up dask so we can have ephemeral distributed compute, but struggling as our k8s cluster doesn’t use the default ‘cluster.local’ internal DNS suffix (this will get fixed at some point, but not likely to be soon) and this address is hard coded, I assume here and I can’t overwrite this value with the env input.
I have tried to work around this by modifying a local copy of dask_kubernetes but can’t seem to overwrite the env var DASK_SCHEDULER_ADDRESS at all.

Any help appreciated, thanks

Hi @stephen-bias, welcome to Dask community!

So what did you try so far?

Did you try to specify your own scheduler address in the cluster.yaml or any other yaml file you using? I’m thinking about something like:

spec:
  worker:
    replicas: 2
    spec:
      containers:
      - name: worker
        image: "ghcr.io/dask/dask:latest"
        imagePullPolicy: "IfNotPresent"
        args:
          - dask-worker
          - --name
          - $(DASK_WORKER_NAME)
          - --dashboard
          - --dashboard-address
          - "8788"
        ports:
          - name: http-dashboard
            containerPort: 8788
            protocol: TCP
        env:
          - name: DASK_SCHEDULER_ADDRESS
            value: "tcp://mycluster-scheduler.mynamespace.svc.mydnssuffix:8786"

Can you show the code you’ve modified?

Other than that, I don’t know Kubernetes enough to understand if this is a required feature or not. Maybe @jacobtomlinson has some thought.

I’ve never seen a cluster that doesn’t use svc.cluster.local but if you have one we should definitely make this configurable. This feels like a pretty quick change that we could make. Would you mind opening an issue on GitHub?

As for overriding the address yourself, we are aware of a bug that stops that from being applied. There is a fix here that is awaiting some further testing. Prepend user env vars by jacobtomlinson · Pull Request #837 · dask/dask-kubernetes · GitHub

Hi - Thanks for the prompt response, apologies for my delayed one

The change I tried to make is here, diff with dask:main but I can’t seemingly get it to do the right thing. The envs are propagated with the env arg but doesn’t overwrite the DASK_SCHEDULER_ADDRESS when set. To test this i installed the package in the base dask image with conda pip, but I recognise the limit of my knowledge as I’m not a conda/mamba user so may have done it wrong (I did also try the env var EXTRA_PIP_PACKAGES). of course the version of dask-kubernetes I’m using locally has the change also.

The other thing I’ve noticed is that the workers are created with duplicate env vars anyway, so if I create a cluster with 3 workers, the third worker that is created will have the following env vars:

    - name: DASK_WORKER_NAME
      value: test-default-worker-aa9d67e1c2
    - name: DASK_SCHEDULER_ADDRESS
      value: tcp://test-scheduler.external.svc.cluster.local:8786
    - name: DASK_WORKER_NAME
      value: test-default-worker-99c159ba99
    - name: DASK_SCHEDULER_ADDRESS
      value: tcp://test-scheduler.external.svc.cluster.local:8786
    - name: DASK_WORKER_NAME
      value: test-default-worker-bebdd11981
    - name: DASK_SCHEDULER_ADDRESS
      value: tcp://test-scheduler.external.svc.cluster.local:8786

the first worked has normal, second has two etc.

I have tried using a custom cluster spec too:

from dask_kubernetes.operator import KubeCluster, make_cluster_spec
spec = make_cluster_spec(name="test")
spec["spec"]["worker"]["spec"]["containers"][0]["env"] = [{"name": "DASK_SCHEDULER_ADDRESS", "value": "anything"}]
cluster = KubeCluster(custom_cluster_spec=spec, namespace="test")

I’m testing all this using both docker desktop + included k8s as well as our sandbox cluster that for some reason uses that dns suffix while our prod and dev ones do not

I have opened two github issues:

  • the first, the issue (feature request I have put it as) with setting a custom scheduler address from the env argument: /dask/dask-kubernetes/issues/842
  • the duplicate env var issue which I believe to be related but don’t want to assume: /dask/dask-kubernetes/issues/841

(having issues with posting links so have ommitted the url)