ClientResponseError: 401, message='Unauthorized'

Hello,

I have deployed DaskHub version 2023.1.0.

I am using:

dask-gateway:
  gateway:
    auth:
      jupyterhub:
        apiToken: <token> 
        apiUrl: http://proxy-public/hub/api
      type: jupyterhub

[...]

jupyterhub:
  hub:
    services:
      dask-gateway:
        apiToken: <token>

[...]

I consistently get:

ClientResponseError: 401, message='Unauthorized', url=URL('http://proxy-public/services/dask-gateway/api/v1/clusters/')

When doing:

from dask_gateway import Gateway
gateway = Gateway()
clusters = gateway.list_clusters()

The underlying k8s cluster was recently upgraded from 1.24 to 1.26, and this DaskHub configuration was working correctly on k8s v1.24. Why could it be failing after the upgrade to k8s v1.26?

Best regards,
Sebastian

We have actually deployed not only one but two DaskHub instances in two separate namespaces. It looks like they would interfere with each other:

kubectl logs service/traefik-daskhub-dask-gateway -n <namespace-1>
level=info msg="Configuration loaded from flags."
level=warning msg="Cross-namespace reference between IngressRoutes and resources is enabled, please ensure that this is expected (see AllowCrossNamespace option)" providerName=kubernetescrd

Deleting DaskHub in one of the namespaces solved the issue.

What would be the correct configuration of the AllowCrossNamespace option in the values.yaml for DaskHub?

I managed to get rid of the message:

level=warning msg="Cross-namespace reference between IngressRoutes and resources is enabled, please ensure that this is expected (see AllowCrossNamespace option)" providerName=kubernetescrd

by passing the following:

dask-gateway:
  traefik:
    additionalArguments:
      - "--providers.kubernetescrd.allowcrossnamespace=false"

However, when I configure two deployments this way, I get back the original error message

ClientResponseError: 401, message='Unauthorized', url=URL('http://proxy-public/services/dask-gateway/api/v1/clusters/')

When doing the below again in any of the two DaskHub clusters:

from dask_gateway import Gateway
gateway = Gateway()
clusters = gateway.list_clusters()

Situation has improved after passing the following:

dask-gateway:
  traefik:
    additionalArguments:
      - "--providers.kubernetescrd.allowcrossnamespace=false"
      - "--providers.kubernetescrd.namespaces=<namespace-1>"
      - "--providers.kubernetesgateway.namespaces=<namespace-1>"
      - "--providers.kubernetesingress.namespaces=<namespace-1>"

We can now successfully create a cluster:

from dask_gateway import Gateway
gateway = Gateway()
clusters = gateway.list_clusters()
cluster = gateway.new_cluster()
cluster.scale(4)

However, there are two pending issues:

  1. Dask workers do not stay in the running state. They are created and destroyed. After some minutes, the number of workers remains stable but I don’t know why. I can’t find useful logs.
  2. Dask dashboard returns a 404, and I don’t know why.

Any suggestions?

Best regards,
Sebastian

Not sure why but adding the following to one of the deployments:

dask-gateway:
  enabled: false
  controller:
    enabled: false

Solves issue 1 above (dask workers are now stable).

Hi @slunav, welcome to Dask Discourse forum!

I’ve personnaly never tried to achieve this kind of installation, and I’m no Kubernetes expert at all. It looks like, as you’ve already understood, there is a conflict between the two Dask-gateway installation, that affect both the dask-gateway API server, but also the created Dask clusters.

Do you think you could try to change the genereated services names on one of the two clusters?

In your last message, I think you just disabled one of the dask-gateway server, so you’ve removed some conflicts, but the two namespaces are probably using the same Dask Gateway setup.

Thanks @guillaumeeb

When I change:

dask-gateway:
  gateway:
    prefix: /services/dask-gateway

with:


dask-gateway:
  gateway:
    prefix: /services/dask-gateway-other

I run:

from dask_gateway import Gateway
gateway = Gateway()
clusters = gateway.list_clusters()
print(clusters)

and I get:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 clusters = gateway.list_clusters()
      2 print(clusters)

File /srv/conda/envs/notebook/lib/python3.9/site-packages/dask_gateway/client.py:456, in Gateway.list_clusters(self, status, **kwargs)
    442 def list_clusters(self, status=None, **kwargs):
    443     """List clusters for this user.
    444 
    445     Parameters
   (...)
    454     clusters : list of ClusterReport
    455     """
--> 456     return self.sync(self._clusters, status=status, **kwargs)

File /srv/conda/envs/notebook/lib/python3.9/site-packages/dask_gateway/client.py:344, in Gateway.sync(self, func, *args, **kwargs)
    340 future = asyncio.run_coroutine_threadsafe(
    341     func(*args, **kwargs), self.loop.asyncio_loop
    342 )
    343 try:
--> 344     return future.result()
    345 except BaseException:
    346     future.cancel()

File /srv/conda/envs/notebook/lib/python3.9/concurrent/futures/_base.py:446, in Future.result(self, timeout)
    444     raise CancelledError()
    445 elif self._state == FINISHED:
--> 446     return self.__get_result()
    447 else:
    448     raise TimeoutError()

File /srv/conda/envs/notebook/lib/python3.9/concurrent/futures/_base.py:391, in Future.__get_result(self)
    389 if self._exception:
    390     try:
--> 391         raise self._exception
    392     finally:
    393         # Break a reference cycle with the exception in self._exception
    394         self = None

File /srv/conda/envs/notebook/lib/python3.9/site-packages/dask_gateway/client.py:435, in Gateway._clusters(self, status)
    432     query = ""
    434 url = f"{self.address}/api/v1/clusters/{query}"
--> 435 resp = await self._request("GET", url)
    436 data = await resp.json()
    437 return [
    438     ClusterReport._from_json(self._public_address, self.proxy_address, r)
    439     for r in data.values()
    440 ]

File /srv/conda/envs/notebook/lib/python3.9/site-packages/dask_gateway/client.py:414, in Gateway._request(self, method, url, json)
    411     msg = await resp.text()
    413 if resp.status in {404, 422}:
--> 414     raise ValueError(msg)
    415 elif resp.status == 409:
    416     raise GatewayClusterError(msg)

ValueError: 404 page not found

Switching back to:

dask-gateway:
  gateway:
    prefix: /services/dask-gateway

Resolves the 404 error above.

Hopefully that’s what you mean with:

Do you think you could try to change the genereated services names on one of the two clusters?

Any thoughts?

Best regards,
Sebastian

Yep, that was what I meant, but I imagine there are other part of the config to modify, especially if you are using daskhub? All the jupyterhub configuration for example (from here: https://github.com/dask/helm-chart/blob/main/daskhub/values.yaml#L11).

Anyway, I’m not sure this is this service name you want to change. You first tried to clearly separate the two namespaces, which is the most obvious solution, and clearly we should be able to do that, two Dask Gateway configuration in two different namespaces should be able to live together.

I’m proposing to change the Kubernetes and other internal URLs so that there is no conflics, which should work even in the same namespace. But I’m not sure of how much need to be changed.

cc @jacobtomlinson or @consideRatio do you have any advice here?