Hello, dear collegues!
We have a Dask Gateway 2023.9.0 installed on the Kubernetes cluster (EKS) with IPv6. When I tried to create a cluster all workes pods got a status CrashLoopBackOff and in the logs, I saw text like this
/home/dask/.local/lib/python3.11/site-packages/distributed/cli/dask_worker.py:266: FutureWarning: dask-worker is deprecated and will be removed in a future release; use `dask worker` instead
warnings.warn(
/home/dask/.local/lib/python3.11/site-packages/distributed/utils.py:165: RuntimeWarning: Couldn't detect a suitable IP address for reaching 'dask-2884f65ecbc44103ac47e7c620232833.dask', defaulting to hostname: [Errno -5] No address associated with hostname
warnings.warn(
2023-09-26 08:07:32,383 - distributed.dask_worker - INFO - End worker
Traceback (most recent call last):
File "/home/dask/.local/lib/python3.11/site-packages/toolz/functoolz.py", line 457, in memof
return cache[k]
~~~~~^^^
KeyError: ('dask-2884f65ecbc44103ac47e7c620232833.dask', 80)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/dask/.local/lib/python3.11/site-packages/distributed/utils.py", line 161, in _get_ip
sock.connect((host, port))
socket.gaierror: [Errno -5] No address associated with hostname
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/dask/.local/bin/dask-worker", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/dask/.local/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dask/.local/lib/python3.11/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/dask/.local/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dask/.local/lib/python3.11/site-packages/distributed/cli/dask_worker.py", line 447, in main
asyncio.run(run())
File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/home/dask/.local/lib/python3.11/site-packages/distributed/cli/dask_worker.py", line 397, in run
nannies = [
^
File "/home/dask/.local/lib/python3.11/site-packages/distributed/cli/dask_worker.py", line 398, in <listcomp>
t(
File "/home/dask/.local/lib/python3.11/site-packages/distributed/nanny.py", line 281, in __init__
host = get_ip(get_address_host(self.scheduler.address))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dask/.local/lib/python3.11/site-packages/distributed/utils.py", line 185, in get_ip
return _get_ip(host, port, family=socket.AF_INET)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dask/.local/lib/python3.11/site-packages/toolz/functoolz.py", line 461, in memof
cache[k] = result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/dask/.local/lib/python3.11/site-packages/distributed/utils.py", line 170, in _get_ip
addr_info = socket.getaddrinfo(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno -2] Name or service not known
The dask scheduler logs
2023-09-27 09:07:26,912 - distributed.scheduler - INFO - State start
2023-09-27 09:07:26,915 - distributed.scheduler - INFO - -----------------------------------------------
2023-09-27 09:07:26,917 - distributed.scheduler - INFO - Scheduler at: tls://169.254.175.125:8786
2023-09-27 09:07:26,917 - distributed.scheduler - INFO - dashboard at: http://169.254.175.125:8787/status
2023-09-27 09:07:26,917 - distributed.preloading - INFO - Run preload setup: dask_gateway.scheduler_preload
I’m not sure but It seems that the scheduler does not listen to IPV6 and workers can’t connect to it. If I’m right how can I configure the Dask Gateway Helm chart to fix it?