dask_jobqueue.PBSCluster Scale() PBS Script qsub error

I am trying to run Dask PBSCluster on HPC, and when I try to do cluster.scale(10) it errors out with the

Task exception was never retrieved
future: <Task finished name='Task-27' coro=<_wrap_awaitable() done, defined at /user_path/.conda/envs/my_proj/lib/python3.8/asyncio/tasks.py:688> exception=RuntimeError('Command exited with non-zero exit code.\nExit code: 32\nCommand:\nqsub /user_temp_dir/dask_temp/tmp1a30buic.sh\nstdout:\n\nstderr:\nqsub: Error: select statement must be lower case\n\n')>
Traceback (most recent call last):
  File "/user_path/.conda/envs/my_proj/lib/python3.8/asyncio/tasks.py", line 695, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/user_path/.conda/envs/my_proj/lib/python3.8/site-packages/distributed/deploy/spec.py", line 59, in _
    await self.start()
  File "/user_path/.local/lib/python3.8/site-packages/dask_jobqueue/core.py", line 325, in start
    out = await self._submit_job(fn)
  File "/user_path/.local/lib/python3.8/site-packages/dask_jobqueue/core.py", line 308, in _submit_job
    return self._call(shlex.split(self.submit_command) + [script_filename])
  File "/user_path/.local/lib/python3.8/site-packages/dask_jobqueue/core.py", line 403, in _call
    raise RuntimeError(
RuntimeError: Command exited with non-zero exit code.
Exit code: 32
qsub /dask_temp/tmp1a30buic.sh

qsub: Error: select statement must be lower case

Here’s one of the pbs script file (tmp1a30buic.sh) created by PBSCluster().scale()

#!/usr/bin/env bash

#PBS -N dask-worker
#PBS -A xxxxxxxxx
#PBS -l select=1:ncpus=44:mem=100GB
#PBS -l walltime=23:59:59

/user_path/.conda/envs/wwsoil/bin/python3.8 -m distributed.cli.dask_worker tcp://my_ip:43421 --nthreads 4 --nprocs 11 --memory-limit 10GiB --name dummy-name --nanny --death-timeout 60 

When I run qsub /dask_temp/tmp1a30buic.sh I get the error above.

Any idea what is causing the error, there is no helpful messages to debug it, any help is appreciated!

Well, I found out the HPC I am using needs to have mpiprocs=44 in the select

cluster = PBSCluster(queue='standard', cores=44, processes=10, memory='100GB', 
                     project='xxxxxxxxx', walltime='1:00:00', nanny=True, 

which creates the proper PBS script file

#PBS -q standard
#PBS -l select=1:ncpus=44:mpiprocs:44:mem=100GB

But the error that wasn’t helpful to debug at all. There is no lower case


Interesting, and thanks for the solution.

Unfortunately, HPC systems often have some small specific settings like this. I can tell that forcing the use of mpiprocs for a non MPI job in the select statement is unusual!

1 Like