I’m trying to use KubeCluster to start a dask cluster on a remote K3S cluster. The below code is actually able to provision a worker pod. However, creating the cluster with cluster = KubeCluster('worker-spec.yml')
times out before returning the cluster object. This is the timeout error below… Should it be using localhost? Am I on the right track thinking that’s the culprit?
OSError: Timed out during handshake while connecting to tcp://localhost:52729 after 10 s
This is the test code I’m starting with from the KubeCluster docs. As you can see, I’ve tried altering some of the timeout parameters, but that didn’t help.
import dask
from dask_kubernetes import KubeCluster, KubeConfig, make_pod_spec
auth = KubeConfig(config_file="~/.kube/remote")
# dask.config.set({"kubernetes.scheduler-service-wait-timeout": 300})
# dask.config.set({"distributed.comm.timeouts.connect": 300})
cluster = KubeCluster('worker-spec.yml')
cluster.scale(3)
P.S. I’m using the worker-spec.yml
from the example in the docs (KubeCluster — Dask Kubernetes 2021.03.0+100.g4a69e3a documentation) and updated it with the resource limits and requests that match my nodes’ CPU and RAM.