Dask-Kubernetes not cleaning up properly

rahloff · July 22, 2022, 9:57am

Hi,

I’m creating ephemeral Dask Clusters on K8S (AWS EKS) to run some ETL jobs.
Unfortunately, the “dask-root” pods are never stopped and deleted.

Is there a config flag I can set for a full cleanup?

At the moment, my cluster accumulates a one idling dask-root pod per finished ETL run:

Code:

# see https://kubernetes.dask.org/en/latest/
def dask_pod_spec() -> V1Pod:
    return make_pod_spec(
        image="ghcr.io/dask/dask:latest",
        memory_limit="1G",
        memory_request="1G",
        cpu_limit=1,
        cpu_request=1,
        extra_pod_config={"serviceAccountName": "prefect-agent-service-account"},
    )


# https://kubernetes.dask.org/en/latest/kubecluster.html
dask_runner = DaskTaskRunner(
        cluster_class=dask_kubernetes.KubeCluster,
        adapt_kwargs={"minimum": 1, "maximum": 10},
        cluster_kwargs={
            "pod_template": dask_pod_spec(),
            "env": {
                "EXTRA_PIP_PACKAGES": "prefect==2.0b11"
            }
        },
    )

pavithraes · July 27, 2022, 1:19pm

@rahloff Welcome to Discourse!

I’m not familiar with this, but maybe @guillaumeeb or @jacobtomlinson have some thoughts.

rahloff · August 8, 2022, 8:25am

Hi pavithraes, thanks for making the introductions!
@guillaumeeb / @jacobtomlinson, any idea or pointers where I might need to look?

guillaumeeb · August 11, 2022, 9:30am

Hi there,

Sorry, I don’t have a lot of experience with dask-kubernetes, I’m not even sure what the dask-root pod is supposed to run. Maybe it would help to get the logs from one of this pod to see what’s running on it and why it’s not terminated at the end of the task?

Since you’re using Prefect to launch Dask, maybe there is some side effect?

jacobtomlinson · August 17, 2022, 10:04am

The KubeCluster class should delete the pods when it gets deleted. If this isn’t happening there is either a bug in dask-kubernetes or Prefect is doing something to break this functionality.

I am not an expert on Prefect so I can’t comment there. But ideally the DaskTaskRunner should be calling the cluster.close() when it is finished.

If you have any interest in opening an issue on dask-kubernetes with a minimal reproducer (i.e no Prefect) that would be great.

Topic		Replies	Views
"Invalid kube-config file" while using Dask Kubernetes to start a Cluster Deploying Dask	3	3011	April 11, 2022
Run dask in parallel doesn't work as expected, in distributed kubernetes pods Distributed	11	478	March 17, 2023
K8s Operator: DaskCluster spec updates do not propagate to pods Deploying Dask dask-kubernetes , kubernetes	1	227	December 16, 2022
Metaflow + Dask over k8s how to make the cluster to not shut down? Distributed kubernetes	4	135	February 15, 2024
How do I delete an existing KubeCluster? Distributed dask-kubernetes , kubernetes	1	383	May 10, 2022

Dask-Kubernetes not cleaning up properly

Related topics