Starting slurm based dask cluster using pre-allocated resources

In our usecase, our application gets a set of pre-allocated resources (nodes, cpus, ram) which we ask for when starting application with sbatch or salloc. Our requirement is that the application should run within those resource limits.

Dask job-queue starts a new slurm job for every SLURMJob using the sbatch command. This means slurm needs to allocate new resources, which will not work for our scenario.

The SLURMRunner API is built for using existing resources, but that interface expects user to run multiple copies of entry file with srun, which will not work for our usecase.

Idea scenario for is a “Cluster” like interface, similar to “SLURMCluster”, but it also uses existing resources (this might break scale() method)

Any suggestions on how we can achieve this?

As a workaround,
we have defined a custom Cluster class, which starts workers in a subprocess using srun command and dask worker cli.

So something like:

srun -n <num_nodes> --ntasks-per-node=1 dask worker ...

Using srun we start a job step inside existing sbatch allocation, rather than waiting for new resources.

Hi @maneesh29s, welcome to Dask community!

Could you clarify why it won’t work for your use case? Why do you really need a Cluster class?

Our “main” application which contains the dask client code (creates the dask graph and calls compute), simply expects a dask scheduler address to be passed as its cli option. It doesn’t create a dask cluster itself.
There are many such “main” applications, each having different requirements of resources (e.g. some of them work well with more threads, others work well with more processes etc.)

There’s a common wrapper application, also in python, which is user facing (mostly developers), and is responsible for starting up a dask cluster as per user’s requirements (number of processes, threads, memory etc.), inside an existing slurm resource allocation.
This wrapper app, is responsible to:

  1. start a dask cluster using existing slurm resources
  2. run the “main” application as a subprocess, passing the dask scheduler address as a cli option
  3. close the cluster once main application dies

Also, if the wrapper application finds out that slurm resources are not allocated (or if user is not running the app in a HPC setup), it should fall back to spinning up a LocalCluster with user specified resources.

So that’s why I think we need a cluster-like class inside the wrapper application.

I’m still not sure why you cannot use SLURMRunner logic in this wrapper application?

As far as I understand from this overview page,
The with SlurmRunner: call has to be the very first line in the python script, to make sure that srun -n 5 app.py does not execute duplicate code.

We have some “X” amount of work which we do even before we start the cluster. The cluster parameters are dependent on some part of the “X” work.

So, my concern is that the “X” amount of work will be done multiple times, across all python processes spin up by the srun -n 5 app.py command.

You could do as SLURMRunner and check SLURM ranks. And I’ve got the other questions: which cluster parameters? Isn’t your allocation already made?