Hi,
I am using a distributed Dask cluster. The cluster is launched using PBS and dask_jobqueue. The cluster is created without any problems.
I am testing a simple program
def slow_increment(x):
return x+1
futures = client.submit(slow_increment, 5000)
but the future does not get computed.
**Future: slow_increment** status: pending, type: NoneType, key: slow_increment-eb33bf6843c09eda18953fcceb1f96d7
Any suggestion why this is happening?
Hi @josephjohnjj,
Do you have access to the Dask Dashboard to see if the cluster is started and the task queued?
Could you try to print the cluster object and information from it? Does it sees the Workers?
Hi @guillaumeeb,
Yes, I did print the cluster has started as expected and it can see the workers (screenshoot attached).
But the task is not queued, in the dashboard the task stream is empty.
Are you able to have a look at the Scheduler logs? Maybe just client.get_logs()
?
Hi @guillaumeeb ,
I am not getting anything from the client, for instance I ran
client.run(lambda: get_worker().name)
and there is no information.
Also I am getting this warning
/scratch/vp91/Training/python3.9-venv/lib/python3.9/site-packages/distributed/client.py:1401: VersionMismatchWarning: Mismatched versions found
+-------------+---------------+---------------+----------------+
| Package | Client | Scheduler | Workers |
+-------------+---------------+---------------+----------------+
| dask | 2023.8.0 | 2023.8.0 | 2023.5.1 |
| distributed | 2023.8.0 | 2023.8.0 | 2023.5.1 |
| lz4 | None | None | 4.3.2 |
| numpy | 1.24.4 | 1.24.4 | 1.25.0 |
| pandas | 2.0.3 | 2.0.3 | 1.5.3 |
| python | 3.9.2.final.0 | 3.9.2.final.0 | 3.11.4.final.0 |
+-------------+---------------+---------------+----------------+
warnings.warn(version_module.VersionMismatchWarning(msg[0]["warning"]))
Could this be the problem?
I am using a python virtual environment, but I am activating that environment in the PBS script:
#!/usr/bin/env bash
#PBS -N dask-worker
#PBS -l walltime=00:50:00
#PBS -q normal
#PBS -P vp91
#PBS -l ncpus=48
#PBS -l mem=192GB
module load python3/3.9.2
source /scratch/vp91/Training/python3.9-venv/bin/activate
/scratch/vp91/Training/environment/bin/python -m distributed.cli.dask_worker tcp://10.6.79.29:43947 --nthreads 6 --nworkers 8 --memory-limit 22.35GiB --name dummy-name --nanny --death-timeout 60 --local-directory $TMPDIR --interface ib0
Are you able to see the Scheduler logs somewhere in the Dashboard, or using client methods (not to submit or run code in the worker, but just to query the Scheduler).
It could, it’s never good to have different environment between Scheduler and workers.
It looks in the last line that the loaded venv is not really used. Did you configure another Python command somewhere in your yaml files?
It was definitely the problem with different versions. The problem was resolved when I corrected it.
1 Like