dask_jobqueue with a SLURM cluster. My goal is to ensure:
- Each Dask task starts on a brand-new SLURM job. (e.g. heavy neural network trainings)
- A maximum of 5 tasks/jobs are running concurrently at any given moment.
When I submit 10 tasks, Dask respects the concurrency limit of 5 tasks but reuses old SLURM jobs for new tasks. I want each task to be associated with its own fresh SLURM job.
import numpy as np from dask_jobqueue import SLURMCluster from dask.distributed import Client, as_completed def train_config(n_runs): rng = np.random.default_rng() return rng.standard_normal() cluster = SLURMCluster( cores=2, account="xyz", memory="8000M", walltime="00:30:00", job_extra_directives=["--mem-per-cpu=2000M"], job_directives_skip=["--mem"], local_directory="/work/abc/tmp" ) cluster.scale(jobs=5) client = Client(cluster) # Submitting 10 tasks futures = [client.submit(train_config, 1) for _ in range(10)]
How can I configure Dask or the SLURMCluster to ensure each task runs on its own fresh SLURM job?