I am trying to run Dask PBSCluster on HPC, and when I try to do cluster.scale(10) it errors out with the
Task exception was never retrieved
future: <Task finished name='Task-27' coro=<_wrap_awaitable() done, defined at /user_path/.conda/envs/my_proj/lib/python3.8/asyncio/tasks.py:688> exception=RuntimeError('Command exited with non-zero exit code.\nExit code: 32\nCommand:\nqsub /user_temp_dir/dask_temp/tmp1a30buic.sh\nstdout:\n\nstderr:\nqsub: Error: select statement must be lower case\n\n')>
Traceback (most recent call last):
File "/user_path/.conda/envs/my_proj/lib/python3.8/asyncio/tasks.py", line 695, in _wrap_awaitable
return (yield from awaitable.__await__())
File "/user_path/.conda/envs/my_proj/lib/python3.8/site-packages/distributed/deploy/spec.py", line 59, in _
await self.start()
File "/user_path/.local/lib/python3.8/site-packages/dask_jobqueue/core.py", line 325, in start
out = await self._submit_job(fn)
File "/user_path/.local/lib/python3.8/site-packages/dask_jobqueue/core.py", line 308, in _submit_job
return self._call(shlex.split(self.submit_command) + [script_filename])
File "/user_path/.local/lib/python3.8/site-packages/dask_jobqueue/core.py", line 403, in _call
raise RuntimeError(
RuntimeError: Command exited with non-zero exit code.
Exit code: 32
Command:
qsub /dask_temp/tmp1a30buic.sh
stdout:
stderr:
qsub: Error: select statement must be lower case
Here’s one of the pbs script file (tmp1a30buic.sh) created by PBSCluster().scale()
#!/usr/bin/env bash
#PBS -N dask-worker
#PBS -q HIE
#PBS -A xxxxxxxxx
#PBS -l select=1:ncpus=44:mem=100GB
#PBS -l walltime=23:59:59
/user_path/.conda/envs/wwsoil/bin/python3.8 -m distributed.cli.dask_worker tcp://my_ip:43421 --nthreads 4 --nprocs 11 --memory-limit 10GiB --name dummy-name --nanny --death-timeout 60
When I run qsub /dask_temp/tmp1a30buic.sh
I get the error above.
Any idea what is causing the error, there is no helpful messages to debug it, any help is appreciated!