Hello Folks,
I am writing to you to seek your assistance with a problem that I am encountering while using dask-jobqueue in a Slurm cluster. Specifically, I have written a code that uses dask-jobqueue to connect workers to a scheduler in order to execute my tasks. While my code runs without any errors, I have recently noticed that the workers are not connecting to the scheduler, and the log file is displaying an error message indicating that the interface I specified is not recognizable, even though I have chosen “ib0” as the interface.
I am wondering if you could help me resolve this issue by suggesting any possible solutions or providing any guidance on how to identify the correct interface. Your assistance would be greatly appreciated.
cluster = SLURMCluster(
cores=3,
processes=1,
memory="15GB",
shebang="#!/usr/bin/env bash",
queue="****",
walltime="01:00:00",
death_timeout="30s",
interface="ib0",
)
2023-03-31 13:04:34,234 - distributed.nanny - INFO - Closing Nanny at 'not-running'. Reason: nanny-close
2023-03-31 13:04:34,236 - distributed.dask_worker - INFO - End worker
Traceback (most recent call last):
File "/scicore/home/roeoesli/valipo0000/training/anaconda3/envs/py38/lib/python3.8/site-packages/distributed/core.py", line 528, in start
await asyncio.wait_for(self.start_unsafe(), timeout=timeout)
File "/scicore/home/roeoesli/valipo0000/training/anaconda3/envs/py38/lib/python3.8/asyncio/tasks.py", line 494, in wait_for
return fut.result()
File "/scicore/home/roeoesli/valipo0000/training/anaconda3/envs/py38/lib/python3.8/site-packages/distributed/nanny.py", line 331, in start_unsafe
start_address = address_from_user_args(
File "/scicore/home/roeoesli/valipo0000/training/anaconda3/envs/py38/lib/python3.8/site-packages/distributed/comm/addressing.py", line 290, in address_from_user_args
host = get_ip_interface(interface)
File "/scicore/home/roeoesli/valipo0000/training/anaconda3/envs/py38/lib/python3.8/site-packages/distributed/utils.py", line 208, in get_ip_interface
raise ValueError(
ValueError: 'ib0' is not a valid network interface. Valid network interfaces are: ['lo', 'eth2', 'eth0', 'eth1']