Dask-Kubernetes not cleaning up properly

Hi,

I’m creating ephemeral Dask Clusters on K8S (AWS EKS) to run some ETL jobs.
Unfortunately, the “dask-root” pods are never stopped and deleted.

Is there a config flag I can set for a full cleanup?

At the moment, my cluster accumulates a one idling dask-root pod per finished ETL run:

Code:

# see https://kubernetes.dask.org/en/latest/
def dask_pod_spec() -> V1Pod:
    return make_pod_spec(
        image="ghcr.io/dask/dask:latest",
        memory_limit="1G",
        memory_request="1G",
        cpu_limit=1,
        cpu_request=1,
        extra_pod_config={"serviceAccountName": "prefect-agent-service-account"},
    )


# https://kubernetes.dask.org/en/latest/kubecluster.html
dask_runner = DaskTaskRunner(
        cluster_class=dask_kubernetes.KubeCluster,
        adapt_kwargs={"minimum": 1, "maximum": 10},
        cluster_kwargs={
            "pod_template": dask_pod_spec(),
            "env": {
                "EXTRA_PIP_PACKAGES": "prefect==2.0b11"
            }
        },
    )

@rahloff Welcome to Discourse!

I’m not familiar with this, but maybe @guillaumeeb or @jacobtomlinson have some thoughts. :smile:

1 Like

Hi pavithraes, thanks for making the introductions!
@guillaumeeb / @jacobtomlinson, any idea or pointers where I might need to look?

Hi there,

Sorry, I don’t have a lot of experience with dask-kubernetes, I’m not even sure what the dask-root pod is supposed to run. Maybe it would help to get the logs from one of this pod to see what’s running on it and why it’s not terminated at the end of the task?

Since you’re using Prefect to launch Dask, maybe there is some side effect?

The KubeCluster class should delete the pods when it gets deleted. If this isn’t happening there is either a bug in dask-kubernetes or Prefect is doing something to break this functionality.

I am not an expert on Prefect so I can’t comment there. But ideally the DaskTaskRunner should be calling the cluster.close() when it is finished.

If you have any interest in opening an issue on dask-kubernetes with a minimal reproducer (i.e no Prefect) that would be great.