Deploying Dask on an rke2 custom cluster

Hello

I try, during several months now, to install Dask on a K8s cluster.

I first try on an Openstack managed cluster Magnum but my kubernetes version was to old and i cannot install the Dask-gateway

#Last helm chart version (2024.1.0) requires kubeVersion: >=1.25.0-0 which is incompatible with Kubernetes v1.23.16

To bypass this constraint, we finally decide to install it on an rke2 custom cluster (and also to be much close to our actual dev/prod platform)

So for now, I’m trying to deploy Dask-Gateway (which seems to be the actual new way to deploy it) on a custom rke2 cluster on Openstack. (https://docs.rke2.io/)

I use this script

#!/bin/bash

kubectl delete namespace dask-gateway

kubectl create namespace dask-gateway

RELEASE=dask-gateway
NAMESPACE=dask-gateway

#version 2023.1.1 max possible avec kubernetes 1.23.16 => error 404
#Last helm chart version (2024.1.0) requires kubeVersion: >=1.25.0-0 which is incompatible with Kubernetes v1.23.16

helm upgrade --version 2024.1.0 dask-gateway dask-gateway \
    --repo=https://helm.dask.org \
    --install \
    --namespace dask-gateway \
    --values ./config.yaml

kubectl apply -f rolebindingpsp.yaml -n dask-gateway

With that config.yaml

## Provide a name to partially substitute for the full names of resources (will maintain the release name)
##
nameOverride: ""

## Provide a name to substitute for the full names of resources
##
fullnameOverride: ""

# gateway nested config relates to the api Pod and the dask-gateway-server
# running within it, the k8s Service exposing it, as well as the schedulers
# (gateway.backend.scheduler) and workers gateway.backend.worker) created by the
# controller when a DaskCluster k8s resource is registered.
gateway:
  # Number of instances of the gateway-server to run
  replicas: 1

  # Annotations to apply to the gateway-server pods.
  annotations: {}

  # Resource requests/limits for the gateway-server pod.
  resources: {}

  # Path prefix to serve dask-gateway api requests under
  # This prefix will be added to all routes the gateway manages
  # in the traefik proxy.
  prefix: /

  # The gateway server log level
  loglevel: INFO

  # The image to use for the dask-gateway-server pod (api pod)
  image:
    name: ghcr.io/dask/dask-gateway-server
    tag: "set-by-chartpress"
    pullPolicy:

  # Add additional environment variables to the gateway pod
  # e.g.
  # env:
  # - name: MYENV
  #   value: "my value"
 # env: []

  # Image pull secrets for gateway-server pod
  imagePullSecrets: []

  # Configuration for the gateway-server service
  service:
    annotations: {}

  auth:
    # The auth type to use. One of {simple, kerberos, jupyterhub, custom}.
    type: simple

    simple:
      # A shared password to use for all users.
      password:

    kerberos:
      # Path to the HTTP keytab for this node.
      keytab:

    jupyterhub:
      # A JupyterHub api token for dask-gateway to use. See
      # https://gateway.dask.org/install-kube.html#authenticating-with-jupyterhub.
      apiToken:

      # The JupyterHub Helm chart will automatically generate a token for a
      # registered service. If you don't specify an apiToken explicitly as
      # required in dask-gateway version <=2022.6.1, the dask-gateway Helm chart
      # will try to look for a token from a k8s Secret created by the JupyterHub
      # Helm chart in the same namespace. A failure to find this k8s Secret and
      # key will cause a MountFailure for when the api-dask-gateway pod is
      # starting.
      apiTokenFromSecretName: hub
      apiTokenFromSecretKey: hub.services.dask-gateway.apiToken

      # JupyterHub's api url. Inferred from JupyterHub's service name if running
      # in the same namespace.
      apiUrl:

    custom:
      # The full authenticator class name.
      class:

      # Configuration fields to set on the authenticator class.
      config: {}

  livenessProbe:
    # Enables the livenessProbe.
    enabled: true
    # Configures the livenessProbe.
    initialDelaySeconds: 5
    timeoutSeconds: 2
    periodSeconds: 10
    failureThreshold: 6
  readinessProbe:
    # Enables the readinessProbe.
    enabled: true
    # Configures the readinessProbe.
    initialDelaySeconds: 5
    timeoutSeconds: 2
    periodSeconds: 10
    failureThreshold: 3

  # nodeSelector, affinity, and tolerations the for the `api` pod running dask-gateway-server
  nodeSelector: {}
  affinity: {}
  tolerations: []

  # Any extra configuration code to append to the generated `dask_gateway_config.py`
  # file. Can be either a single code-block, or a map of key -> code-block
  # (code-blocks are run in alphabetical order by key, the key value itself is
  # meaningless). The map version is useful as it supports merging multiple
  # `values.yaml` files, but is unnecessary in other cases.
  extraConfig: {}

  # backend nested configuration relates to the scheduler and worker resources
  # created for DaskCluster k8s resources by the controller.
  backend:
    # The image to use for both schedulers and workers.
    image:
      name: ghcr.io/dask/dask-gateway
      tag: "set-by-chartpress"
      pullPolicy:

    # Image pull secrets for a dask cluster's scheduler and worker pods
    imagePullSecrets: []

    # The namespace to launch dask clusters in. If not specified, defaults to
    # the same namespace the gateway is running in.
    namespace:

    # A mapping of environment variables to set for both schedulers and workers.
    environment: {}

    scheduler:
      # Any extra configuration for the scheduler pod. Sets
      # `c.KubeClusterConfig.scheduler_extra_pod_config`.
      extraPodConfig: {}

      # Any extra configuration for the scheduler container.
      # Sets `c.KubeClusterConfig.scheduler_extra_container_config`.
      extraContainerConfig: {}

      # Cores request/limit for the scheduler.
      cores:
        request:
        limit:

      # Memory request/limit for the scheduler.
      memory:
        request:
        limit:

    worker:
      # Any extra configuration for the worker pod. Sets
      # `c.KubeClusterConfig.worker_extra_pod_config`.
      extraPodConfig: {}

      # Any extra configuration for the worker container. Sets
      # `c.KubeClusterConfig.worker_extra_container_config`.
      extraContainerConfig: {}

      # Cores request/limit for each worker.
      cores:
        request:
        limit:

      # Memory request/limit for each worker.
      memory:
        request:
        limit:

      # Number of threads available for a worker. Sets
      # `c.KubeClusterConfig.worker_threads`
      threads:


# controller nested config relates to the controller Pod and the
# dask-gateway-server running within it that makes things happen when changes to
# DaskCluster k8s resources are observed.
controller:
  # Whether the controller should be deployed. Disabling the controller allows
  # running it locally for development/debugging purposes.
  enabled: true

  # Any annotations to add to the controller pod
  annotations: {}

  # Resource requests/limits for the controller pod
  resources: {}

  # Image pull secrets for controller pod
  imagePullSecrets: []

  # The controller log level
  loglevel: INFO

  # Max time (in seconds) to keep around records of completed clusters.
  # Default is 24 hours.
  completedClusterMaxAge: 86400

  # Time (in seconds) between cleanup tasks removing records of completed
  # clusters. Default is 5 minutes.
  completedClusterCleanupPeriod: 600

  # Base delay (in seconds) for backoff when retrying after failures.
  backoffBaseDelay: 0.1

  # Max delay (in seconds) for backoff when retrying after failures.
  backoffMaxDelay: 300

  # Limit on the average number of k8s api calls per second.
  k8sApiRateLimit: 50

  # Limit on the maximum number of k8s api calls per second.
  k8sApiRateLimitBurst: 100

  # The image to use for the controller pod.
  image:
    name: ghcr.io/dask/dask-gateway-server
    tag: "set-by-chartpress"
    pullPolicy:

  # Settings for nodeSelector, affinity, and tolerations for the controller pods
  nodeSelector: {}
  affinity: {}
  tolerations: []



# traefik nested config relates to the traefik Pod and Traefik running within it
# that is acting as a proxy for traffic towards the gateway or user created
# DaskCluster resources.
traefik:
  # Number of instances of the proxy to run
  replicas: 1

  # Any annotations to add to the proxy pods
  annotations: {}

  # Resource requests/limits for the proxy pods
  resources: {}

  # The image to use for the proxy pod
  image:
    name: traefik
    tag: "2.10.6"
    pullPolicy:
  imagePullSecrets: []

  # Any additional arguments to forward to traefik
  additionalArguments: []

  # The proxy log level
  loglevel: WARN

  # Whether to expose the dashboard on port 9000 (enable for debugging only!)
  dashboard: false

  # Additional configuration for the traefik service
  service:
    type: NodePort
    annotations: {}
    spec: {}
    ports:
      web:
        # The port HTTP(s) requests will be served on
        port: 80
        nodePort: 32222
      tcp:
        # The port TCP requests will be served on. Set to `web` to share the
        # web service port
        port: web
        nodePort:

  # Settings for nodeSelector, affinity, and tolerations for the traefik pods
  nodeSelector: {}
  affinity: {}
  tolerations: []



# rbac nested configuration relates to the choice of creating or replacing
# resources like (Cluster)Role, (Cluster)RoleBinding, and ServiceAccount.
rbac:
  # Whether to enable RBAC.
  enabled: false

  # Existing names to use if ClusterRoles, ClusterRoleBindings, and
  # ServiceAccounts have already been created by other means (leave set to
  # `null` to create all required roles at install time)
  controller:
    serviceAccountName:

  gateway:
    serviceAccountName:

  traefik:
    serviceAccountName:



# global nested configuration is accessible by all Helm charts that may depend
# on each other, but not used by this Helm chart. An entry is created here to
# validate its use and catch YAML typos via this configurations associated JSON
# schema.
global: {}

And with that rolebindingpsp.yaml


---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: jupyterhub-rolebinding
 # namespace: jhub
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: magnum:podsecuritypolicy:privileged
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:serviceaccounts

And i get this output:

kubectl --namespace=dask-gateway get service traefik-dask-gateway
NAME                   TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
traefik-dask-gateway   NodePort   10.43.75.137   <none>        80:32222/TCP   15s

As you can see i cannot get an External-IP.

I try the LoadBalancer way but get external-ips stays in pending state

k get svc -n dask

NAME             TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                       AGE
dask-jupyter     LoadBalancer   10.43.158.60    <pending>     80:32629/TCP                  9d
dask-scheduler   LoadBalancer   10.43.169.218   <pending>     8786:31939/TCP,80:30678/TCP   9d

So for now i’m stuck with this problem and i’m a bit confused of what component i need or not need to deploy (dask-gateway only ? dask-kubernetes-operator ? LoadBalancers ?)

So, for now, i just would like to know what is the best, up and working practice to deploy Dask on the rke2 cluster

All kind of help and advice will be really appreciate.

Thks.

Hi @Shepard33, welcome to Dask community,

Well it depends on how you cant to use Dask. Bacially, you’ll want to chose between Dask Gateway and Dask Kubernetes. Dask Gateway is well suited on a Jupyterhub deployment, where users don’t have the right to use kubectl typically, but might be a bit harder to customize. It is typically deployed as part of the daskhub Helm chart for a multi use environment. KubeCluster requires to have priviledges on the Kubernetes cluster, but provides a Kubernetes native experience.

See the documentation for more details.

I’m not sure of what you mean by LoadBalancers though.

Hi @guillaumeeb.

Thanks for the greetings and your answer

Bacially, you’ll want to chose between Dask Gateway and Dask Kubernetes

Ok.

The choice would be the Dask kubernetes

KubeCluster requires to have priviledges on the Kubernetes cluster, but provides a Kubernetes native experience.

I ran into that issue now.

When i run this in python environnement on Jupyterlab:

from dask_kubernetes.operator import KubeCluster
cluster = KubeCluster(name="my-dask-cluster", image='ghcr.io/dask/dask:latest')
cluster.scale(10)

i have this ouput:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 2
      1 from dask_kubernetes.operator import KubeCluster
----> 2 cluster = KubeCluster(name="my-dask-cluster", image='ghcr.io/dask/dask:latest')
      3 cluster.scale(10)

File /srv/conda/envs/notebook/lib/python3.11/site-packages/dask_kubernetes/operator/kubecluster/kubecluster.py:282, in KubeCluster.__init__(self, name, namespace, image, n_workers, resources, env, worker_command, port_forward_cluster_ip, create_mode, shutdown_on_close, idle_timeout, resource_timeout, scheduler_service_type, custom_cluster_spec, scheduler_forward_port, jupyter, loop, asynchronous, quiet, **kwargs)
    280 if not called_from_running_loop:
    281     self._loop_runner.start()
--> 282     self.sync(self._start)

File /srv/conda/envs/notebook/lib/python3.11/site-packages/distributed/utils.py:358, in SyncMethodMixin.sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    356     return future
    357 else:
--> 358     return sync(
    359         self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    360     )

File /srv/conda/envs/notebook/lib/python3.11/site-packages/distributed/utils.py:434, in sync(loop, func, callback_timeout, *args, **kwargs)
    431         wait(10)
    433 if error is not None:
--> 434     raise error
    435 else:
    436     return result

File /srv/conda/envs/notebook/lib/python3.11/site-packages/distributed/utils.py:408, in sync.<locals>.f()
    406         awaitable = wait_for(awaitable, timeout)
    407     future = asyncio.ensure_future(awaitable)
--> 408     result = yield future
    409 except Exception as exception:
    410     error = exception

File /srv/conda/envs/notebook/lib/python3.11/site-packages/tornado/gen.py:767, in Runner.run(self)
    765 try:
    766     try:
--> 767         value = future.result()
    768     except Exception as e:
    769         # Save the exception for later. It's important that
    770         # gen.throw() not be called inside this try/except block
    771         # because that makes sys.exc_info behave unexpectedly.
    772         exc: Optional[Exception] = e

File /srv/conda/envs/notebook/lib/python3.11/site-packages/dask_kubernetes/operator/kubecluster/kubecluster.py:297, in KubeCluster._start(self)
    295 async def _start(self):
    296     if not self.namespace:
--> 297         api = await kr8s.asyncio.api()
    298         self.namespace = api.namespace
    299     try:

File /srv/conda/envs/notebook/lib/python3.11/site-packages/kr8s/asyncio/_api.py:45, in api(url, kubeconfig, serviceaccount, namespace, context, _asyncio)
     42         return await list(_cls._instances[thread_id].values())[0]
     43     return await _cls(**kwargs, bypass_factory=True)
---> 45 return await _f(
     46     url=url,
     47     kubeconfig=kubeconfig,
     48     serviceaccount=serviceaccount,
     49     namespace=namespace,
     50     context=context,
     51 )

File /srv/conda/envs/notebook/lib/python3.11/site-packages/kr8s/asyncio/_api.py:43, in api.<locals>._f(**kwargs)
     37 if (
     38     all(k is None for k in kwargs.values())
     39     and thread_id in _cls._instances
     40     and list(_cls._instances[thread_id].values())
     41 ):
     42     return await list(_cls._instances[thread_id].values())[0]
---> 43 return await _cls(**kwargs, bypass_factory=True)

File /srv/conda/envs/notebook/lib/python3.11/site-packages/kr8s/_api.py:60, in Api.__await__.<locals>.f()
     59 async def f():
---> 60     await self.auth
     61     return self

File /srv/conda/envs/notebook/lib/python3.11/site-packages/kr8s/_auth.py:52, in KubeAuth.__await__.<locals>.f()
     51 async def f():
---> 52     await self.reauthenticate()
     53     return self

File /srv/conda/envs/notebook/lib/python3.11/site-packages/kr8s/_auth.py:65, in KubeAuth.reauthenticate(self)
     63     await self._load_kubeconfig()
     64 if not self.server:
---> 65     raise ValueError("Unable to find valid credentials")

**ValueError: Unable to find valid credentials**

I guess that Jupyter don’t have the permissions to write on the K8s.

So where can i give him the right to do that ?

So I guess you’ve deployed the helm chart of Dask Kubernetes already?

From where are you running your Jupyter notebook? I think you should have Kubernetes cli installed and a Kube config file for Dask Kubernetes to work from here.

cc @jacobtomlinson

yes i have deployed the helm chart already.

From where are you running your Jupyter notebook?

From a public-ip pointing to the jupyterhub

I think you should have Kubernetes cli installed and a Kube config file for Dask Kubernetes to work from here.

I manage to run through this problem: the config.yaml needs more info on it.

Now when i launch

from dask_kubernetes.operator import KubeCluster
cluster = KubeCluster(name="my-dask-cluster", image='ghcr.io/dask/dask:latest')
cluster.scale(10)

The process launch…

╭─────────────────── Creating KubeCluster ‘my-dask-cluster’ ───────────────────╮
│ │
│ DaskCluster - │
│ Scheduler Pod - │
│ Scheduler Service - │
│ Default Worker Group - │
│ │
│ ⠋ │
╰──────────────────────────────────────────────────────────────────────────────╯

But then “Connection Timeout”

---------------------------------------------------------------------------
ConnectTimeout                            Traceback (most recent call last)
File /srv/conda/envs/notebook/lib/python3.11/site-packages/httpx/_transports/default.py:69, in map_httpcore_exceptions()
     68 try:
---> 69     yield
     70 except Exception as exc:

File /srv/conda/envs/notebook/lib/python3.11/site-packages/httpx/_transports/default.py:373, in AsyncHTTPTransport.handle_async_request(self, request)
    372 with map_httpcore_exceptions():
--> 373     resp = await self._pool.handle_async_request(req)
    375 assert isinstance(resp.stream, typing.AsyncIterable)

File /srv/conda/envs/notebook/lib/python3.11/site-packages/httpcore/_async/connection_pool.py:216, in AsyncConnectionPool.handle_async_request(self, request)
    215     await self._close_connections(closing)
--> 216     raise exc from None
    218 # Return the response. Note that in this case we still have to manage
    219 # the point at which the response is closed.

File /srv/conda/envs/notebook/lib/python3.11/site-packages/httpcore/_async/connection_pool.py:196, in AsyncConnectionPool.handle_async_request(self, request)
    194 try:
    195     # Send the request on the assigned connection.
--> 196     response = await connection.handle_async_request(
    197         pool_request.request
    198     )
    199 except ConnectionNotAvailable:
    200     # In some cases a connection may initially be available to
    201     # handle a request, but then become unavailable.
    202     #
    203     # In this case we clear the connection and try again.

File /srv/conda/envs/notebook/lib/python3.11/site-packages/httpcore/_async/connection.py:99, in AsyncHTTPConnection.handle_async_request(self, request)
     98     self._connect_failed = True
---> 99     raise exc
    101 return await self._connection.handle_async_request(request)

File /srv/conda/envs/notebook/lib/python3.11/site-packages/httpcore/_async/connection.py:76, in AsyncHTTPConnection.handle_async_request(self, request)
     75 if self._connection is None:
---> 76     stream = await self._connect(request)
     78     ssl_object = stream.get_extra_info("ssl_object")

File /srv/conda/envs/notebook/lib/python3.11/site-packages/httpcore/_async/connection.py:122, in AsyncHTTPConnection._connect(self, request)
    121 async with Trace("connect_tcp", logger, request, kwargs) as trace:
--> 122     stream = await self._network_backend.connect_tcp(**kwargs)
    123     trace.return_value = stream

File /srv/conda/envs/notebook/lib/python3.11/site-packages/httpcore/_backends/auto.py:30, in AutoBackend.connect_tcp(self, host, port, timeout, local_address, socket_options)
     29 await self._init_backend()
---> 30 return await self._backend.connect_tcp(
     31     host,
     32     port,
     33     timeout=timeout,
     34     local_address=local_address,
     35     socket_options=socket_options,
     36 )

File /srv/conda/envs/notebook/lib/python3.11/site-packages/httpcore/_backends/anyio.py:114, in AnyIOBackend.connect_tcp(self, host, port, timeout, local_address, socket_options)
    109 exc_map = {
    110     TimeoutError: ConnectTimeout,
    111     OSError: ConnectError,
    112     anyio.BrokenResourceError: ConnectError,
    113 }
--> 114 with map_exceptions(exc_map):
    115     with anyio.fail_after(timeout):

File /srv/conda/envs/notebook/lib/python3.11/contextlib.py:158, in _GeneratorContextManager.__exit__(self, typ, value, traceback)
    157 try:
--> 158     self.gen.throw(typ, value, traceback)
    159 except StopIteration as exc:
    160     # Suppress StopIteration *unless* it's the same exception that
    161     # was passed to throw().  This prevents a StopIteration
    162     # raised inside the "with" statement from being suppressed.

File /srv/conda/envs/notebook/lib/python3.11/site-packages/httpcore/_exceptions.py:14, in map_exceptions(map)
     13     if isinstance(exc, from_exc):
---> 14         raise to_exc(exc) from exc
     15 raise

ConnectTimeout: 

The above exception was the direct cause of the following exception:

ConnectTimeout                            Traceback (most recent call last)
Cell In[5], line 2
      1 from dask_kubernetes.operator import KubeCluster
----> 2 cluster = KubeCluster(name="my-dask-cluster", image='ghcr.io/dask/dask:latest')
      3 cluster.scale(10)

File /srv/conda/envs/notebook/lib/python3.11/site-packages/dask_kubernetes/operator/kubecluster/kubecluster.py:282, in KubeCluster.__init__(self, name, namespace, image, n_workers, resources, env, worker_command, port_forward_cluster_ip, create_mode, shutdown_on_close, idle_timeout, resource_timeout, scheduler_service_type, custom_cluster_spec, scheduler_forward_port, jupyter, loop, asynchronous, quiet, **kwargs)
    280 if not called_from_running_loop:
    281     self._loop_runner.start()
--> 282     self.sync(self._start)

File /srv/conda/envs/notebook/lib/python3.11/site-packages/distributed/utils.py:358, in SyncMethodMixin.sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    356     return future
    357 else:
--> 358     return sync(
    359         self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    360     )

File /srv/conda/envs/notebook/lib/python3.11/site-packages/distributed/utils.py:434, in sync(loop, func, callback_timeout, *args, **kwargs)
    431         wait(10)
    433 if error is not None:
--> 434     raise error
    435 else:
    436     return result

File /srv/conda/envs/notebook/lib/python3.11/site-packages/distributed/utils.py:408, in sync.<locals>.f()
    406         awaitable = wait_for(awaitable, timeout)
    407     future = asyncio.ensure_future(awaitable)
--> 408     result = yield future
    409 except Exception as exception:
    410     error = exception

File /srv/conda/envs/notebook/lib/python3.11/site-packages/tornado/gen.py:767, in Runner.run(self)
    765 try:
    766     try:
--> 767         value = future.result()
    768     except Exception as e:
    769         # Save the exception for later. It's important that
    770         # gen.throw() not be called inside this try/except block
    771         # because that makes sys.exc_info behave unexpectedly.
    772         exc: Optional[Exception] = e

File /srv/conda/envs/notebook/lib/python3.11/site-packages/dask_kubernetes/operator/kubecluster/kubecluster.py:306, in KubeCluster._start(self)
    304     show_rich_output_task = asyncio.create_task(self._show_rich_output())
    305 cluster = await DaskCluster(self.name, namespace=self.namespace)
--> 306 cluster_exists = await cluster.exists()
    308 if cluster_exists and self.create_mode == CreateMode.CREATE_ONLY:
    309     raise ValueError(
    310         f"Cluster {self.name} already exists and create mode is '{CreateMode.CREATE_ONLY}'"
    311     )

File /srv/conda/envs/notebook/lib/python3.11/site-packages/kr8s/_objects.py:209, in APIObject.exists(self, ensure)
    207 async def exists(self, ensure=False) -> bool:
    208     """Check if this object exists in Kubernetes."""
--> 209     return await self._exists(ensure=ensure)

File /srv/conda/envs/notebook/lib/python3.11/site-packages/kr8s/_objects.py:214, in APIObject._exists(self, ensure)
    212 """Check if this object exists in Kubernetes."""
    213 try:
--> 214     async with self.api.call_api(
    215         "GET",
    216         version=self.version,
    217         url=f"{self.endpoint}/{self.name}",
    218         namespace=self.namespace,
    219         raise_for_status=False,
    220     ) as resp:
    221         status = resp.status_code
    222 except ValueError:

File /srv/conda/envs/notebook/lib/python3.11/contextlib.py:210, in _AsyncGeneratorContextManager.__aenter__(self)
    208 del self.args, self.kwds, self.func
    209 try:
--> 210     return await anext(self.gen)
    211 except StopAsyncIteration:
    212     raise RuntimeError("generator didn't yield") from None

File /srv/conda/envs/notebook/lib/python3.11/site-packages/kr8s/_api.py:132, in Api.call_api(self, method, version, base, namespace, url, raise_for_status, stream, **kwargs)
    130         yield response
    131 else:
--> 132     response = await self._session.request(**kwargs)
    133     if raise_for_status:
    134         response.raise_for_status()

File /srv/conda/envs/notebook/lib/python3.11/site-packages/httpx/_client.py:1574, in AsyncClient.request(self, method, url, content, data, files, json, params, headers, cookies, auth, follow_redirects, timeout, extensions)
   1559     warnings.warn(message, DeprecationWarning)
   1561 request = self.build_request(
   1562     method=method,
   1563     url=url,
   (...)
   1572     extensions=extensions,
   1573 )
-> 1574 return await self.send(request, auth=auth, follow_redirects=follow_redirects)

File /srv/conda/envs/notebook/lib/python3.11/site-packages/httpx/_client.py:1661, in AsyncClient.send(self, request, stream, auth, follow_redirects)
   1653 follow_redirects = (
   1654     self.follow_redirects
   1655     if isinstance(follow_redirects, UseClientDefault)
   1656     else follow_redirects
   1657 )
   1659 auth = self._build_request_auth(request, auth)
-> 1661 response = await self._send_handling_auth(
   1662     request,
   1663     auth=auth,
   1664     follow_redirects=follow_redirects,
   1665     history=[],
   1666 )
   1667 try:
   1668     if not stream:

File /srv/conda/envs/notebook/lib/python3.11/site-packages/httpx/_client.py:1689, in AsyncClient._send_handling_auth(self, request, auth, follow_redirects, history)
   1686 request = await auth_flow.__anext__()
   1688 while True:
-> 1689     response = await self._send_handling_redirects(
   1690         request,
   1691         follow_redirects=follow_redirects,
   1692         history=history,
   1693     )
   1694     try:
   1695         try:

File /srv/conda/envs/notebook/lib/python3.11/site-packages/httpx/_client.py:1726, in AsyncClient._send_handling_redirects(self, request, follow_redirects, history)
   1723 for hook in self._event_hooks["request"]:
   1724     await hook(request)
-> 1726 response = await self._send_single_request(request)
   1727 try:
   1728     for hook in self._event_hooks["response"]:

File /srv/conda/envs/notebook/lib/python3.11/site-packages/httpx/_client.py:1763, in AsyncClient._send_single_request(self, request)
   1758     raise RuntimeError(
   1759         "Attempted to send an sync request with an AsyncClient instance."
   1760     )
   1762 with request_context(request=request):
-> 1763     response = await transport.handle_async_request(request)
   1765 assert isinstance(response.stream, AsyncByteStream)
   1766 response.request = request

File /srv/conda/envs/notebook/lib/python3.11/site-packages/httpx/_transports/default.py:372, in AsyncHTTPTransport.handle_async_request(self, request)
    358 assert isinstance(request.stream, AsyncByteStream)
    360 req = httpcore.Request(
    361     method=request.method,
    362     url=httpcore.URL(
   (...)
    370     extensions=request.extensions,
    371 )
--> 372 with map_httpcore_exceptions():
    373     resp = await self._pool.handle_async_request(req)
    375 assert isinstance(resp.stream, typing.AsyncIterable)

File /srv/conda/envs/notebook/lib/python3.11/contextlib.py:158, in _GeneratorContextManager.__exit__(self, typ, value, traceback)
    156     value = typ()
    157 try:
--> 158     self.gen.throw(typ, value, traceback)
    159 except StopIteration as exc:
    160     # Suppress StopIteration *unless* it's the same exception that
    161     # was passed to throw().  This prevents a StopIteration
    162     # raised inside the "with" statement from being suppressed.
    163     return exc is not value

File /srv/conda/envs/notebook/lib/python3.11/site-packages/httpx/_transports/default.py:86, in map_httpcore_exceptions()
     83     raise
     85 message = str(exc)
---> 86 raise mapped_exc(message) from exc

ConnectTimeout:

I think that it has problems to get Package dask · GitHub no ?

I’m really not sure what the problem is here, cc @jacobtomlinson.

It looks like your Jupyter container doesn’t have good access to the Kubernetes API and requests are timing out. Does this failure happen every time or is it intermittent?

it happens everytime…

I think it is a permission problem

Sounds like it. Your Jupyter Pod needs a service account and a role to grant it enough permissions.

https://kubernetes.dask.org/en/latest/operator_kubecluster.html#role-based-access-control-rbac