Dask gateway server shuts down issue

Hi all,

I am doing a multi user setup with DaskHub and JupyterHub with the use of Daskgateway servers but my connection to dask-gateway servers fails after a while:

Error

[I 2023-04-23 15:14:39.920 DaskGateway] Starting dask-gateway-server - version 2023.1.1
[I 2023-04-23 15:14:40.183 DaskGateway] Authenticator: 'dask_gateway_server.auth.JupyterHubAuthenticator'
[I 2023-04-23 15:14:40.183 DaskGateway] Backend: 'dask_gateway_server.backends.kubernetes.backend.KubeBackend'
[I 2023-04-23 15:14:40.210 DaskGateway] Dask-Gateway server started
[I 2023-04-23 15:14:40.210 DaskGateway] - Private API server listening at http://:8000
[W 2023-04-23 15:15:05.564 DaskGateway] 404 GET / 0.766ms
[W 2023-04-23 15:15:21.760 DaskGateway] 401 GET /api/v1/clusters/ 1.030ms
[I 2023-04-23 15:15:21.791 DaskGateway] 200 GET /api/v1/clusters/ 24.124ms
[I 2023-04-23 15:15:22.615 DaskGateway] 200 GET /api/v1/options 0.474ms
[I 2023-04-23 15:15:23.805 DaskGateway] 200 GET /api/v1/clusters/ 0.563ms
[W 2023-04-23 15:15:58.308 DaskGateway] 401 POST /api/v1/clusters/ 0.720ms
[I 2023-04-23 15:15:58.333 DaskGateway] Creating cluster jhub.82d783eb35a24908bc1f8ece9389104f for user admin
[I 2023-04-23 15:15:58.352 DaskGateway] 201 POST /api/v1/clusters/ 38.320ms
[I 2023-04-23 15:16:01.850 DaskGateway] 200 GET /api/v1/clusters/jhub.82d783eb35a24908bc1f8ece9389104f?wait 3493.248ms
[I 2023-04-23 15:16:04.091 DaskGateway] 200 GET /api/v1/clusters/ 0.455ms
[W 2023-04-23 15:16:05.562 DaskGateway] 404 GET / 0.628ms
[W 2023-04-23 15:17:05.563 DaskGateway] 404 GET / 0.778ms
[W 2023-04-23 15:18:05.563 DaskGateway] 404 GET / 0.613ms
[W 2023-04-23 15:19:05.561 DaskGateway] 404 GET / 0.543ms
[W 2023-04-23 15:20:05.562 DaskGateway] 404 GET / 0.669ms
[W 2023-04-23 15:21:05.562 DaskGateway] 404 GET / 0.684ms
[W 2023-04-23 15:22:05.562 DaskGateway] 404 GET / 0.551ms
[W 2023-04-23 15:23:05.561 DaskGateway] 404 GET / 0.598ms
[W 2023-04-23 15:24:05.562 DaskGateway] 404 GET / 0.892ms
[W 2023-04-23 15:24:32.038 DaskGateway] 401 GET /api/v1/clusters/ 0.742ms
[I 2023-04-23 15:24:32.061 DaskGateway] 200 GET /api/v1/clusters/ 18.504ms
[I 2023-04-23 15:24:34.357 DaskGateway] 200 GET /api/v1/options 0.561ms
[I 2023-04-23 15:24:37.436 DaskGateway] 200 GET /api/v1/clusters/ 0.445ms
[W 2023-04-23 15:24:48.814 DaskGateway] 401 POST /api/v1/clusters/ 0.584ms
[I 2023-04-23 15:24:48.832 DaskGateway] Creating cluster jhub.a57b0b39866040daac9b0faa092a63e0 for user test
[I 2023-04-23 15:24:48.849 DaskGateway] 201 POST /api/v1/clusters/ 32.371ms
[W 2023-04-23 15:25:05.562 DaskGateway] 404 GET / 0.471ms
[I 2023-04-23 15:25:08.856 DaskGateway] 200 GET /api/v1/clusters/jhub.a57b0b39866040daac9b0faa092a63e0?wait 20001.680ms
[I 2023-04-23 15:25:29.363 DaskGateway] 200 GET /api/v1/clusters/jhub.a57b0b39866040daac9b0faa092a63e0?wait 20001.268ms
[I 2023-04-23 15:25:49.871 DaskGateway] 200 GET /api/v1/clusters/jhub.a57b0b39866040daac9b0faa092a63e0?wait 20001.531ms
[W 2023-04-23 15:26:05.562 DaskGateway] 404 GET / 0.521ms
[I 2023-04-23 15:26:10.380 DaskGateway] 200 GET /api/v1/clusters/jhub.a57b0b39866040daac9b0faa092a63e0?wait 20002.017ms
[I 2023-04-23 15:26:24.197 DaskGateway] 200 GET /api/v1/clusters/jhub.a57b0b39866040daac9b0faa092a63e0?wait 13309.081ms
[W 2023-04-23 15:27:05.563 DaskGateway] 404 GET / 0.550ms

Helm chart config values:

jupyterhub:
  proxy:
    secretToken: "<token1>"
  hub:
    services:
      dask-gateway:
        apiToken: "<token2>"
    networkPolicy:
      enabled: false
  ingress:
    enabled: true
    ingressClassName: pomerium
    tls:
      - secretName: jupyter-hub-tls
        hosts:
          - jupyter-hub01.devops.h2.theagilehub.net
    hosts:
      - jupyter-hub01.devops.h2.theagilehub.net
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt-prod
      ingress.pomerium.io/allow_websockets: "true"
      ingress.pomerium.io/pass_identity_headers: 'true'
      ingress.pomerium.io/preserve_host_header: 'true'
      ingress.pomerium.io/policy: |
        [{
          "allow": {
            "or": [
              {"claim/cognito:groups": "eu-west-1_ryHLiyRp9_shell-dev"}
            ]
          }
          }]
dask-gateway:
  gateway:
    auth:
      type: jupyterhub
      jupyterhub:
        apiToken: "<token2>"

Can someone please help me with the same?

Hi @Jyoti492, welcome to Dask community!

It’s a little hard to tell what might be the cause of your issue…

Maybe @jacobtomlinson has some thoughts?

Could you provide a bit more context: on which kind of Kubernetes cluster are your running (AWS, GCP, in-house, etc.), do you have any network specificities (I see you are using a particular Ingress class), or anything else that you could think of?