New requirement of shared hardware environment between scheduler and compute nodes

mitchellshargreaves · May 2, 2023, 2:32am

Our usecase:
https://docs.mlerp.cloud.edu.au/
We have been intending to use Dask to act as the interface between a notebook environment and a SLURM Cluster to allow for interactive job submission. The goal being, to create a middle ground that has the interactivity of a notebook with the power of a HPC environment that can share valuable resources between other users while code isn’t being executed - leading to faster turnarounds and more efficient use of hardware.

Users would be provided with a CPU only notebook environment which would be exposed to a SLURM cluster with access to A100 GPUs and other specialised hardware. Rather than having to write small test scripts to be submitted, they would be able to spin up a Dask cluster with whatever compute requirements that they need and release them when they are done.

Our interpretation:
In your documentation you say that “Dask doesn’t need to know” what the code is doing since it “just runs Python functions”. We hoped that this library would allow our researchers to start their work in a small notebook environment, then scale out to a HPC which will contain whatever acceleration is needed.

The blog post:
Your latest blog post however signals that you intend to move away from this model as you are now requiring that. If you use value-add hardware on the client and workers such as GPUs you’ll need to ensure your scheduler has one.

Doesn’t this break your aim to allow users to scale their code “no matter what infrastructure [they] use”?

New Limitations:
The new restrictions on similarity between the scheduler with the client/workers make what we are trying to accomplish a lot more difficult for scientific workloads. For example some of our users may want many small workers with CPU only for highly parallelisable workloads such as preprocessing, while others will want to be allocated a small number of GPU workers such as training.

Requiring similarity in the hardware of the scheduler and the client/workers would mean that a GPU would have to be allocated to the notebook instances in order to work with the GPU acceleration. Since each GPU can only be broken into 7 MIG slices, this significantly limits the number of users of the platform.

It would also make CPU-only work much more difficult as if the scheduler has a GPU, each CPU node would need GPU compute as well.

While we could provide multiple flavours of notebook for each usecase this would require our users to switch environments as they want to run different tests which is a poor user experience. Previously the users would have been able to simply change the requested requirements of their Dask cluster within a notebook cell.

This would also limit our ability to offer whatever xPU support the field needs in the future.

The question:
Is what we are trying to accomplish still going to be supported by this library or is our use case being dropped? Is there a workaround that you would recommend?

Alternatively is it possible for a workaround to be implemented into the library? For example a flag that will turn off this sanity check similar to the ‘no-nanny’ feature that allows for daemonic processes could go a long way.

guillaumeeb · May 2, 2023, 12:05pm

Hi @mitchellshargreaves, welcome to the Dask forum!

First of all, I really like what you are doing with the MLeRP environment, I think this is a really nice architecture for users.

I have to admit I was not aware of the change you mentioned and the blog post:
http://blog.dask.org/2023/04/14/scheduler-environment-requirements

I’m a bit surprise of this change and I have to admit I think it can have more impact than intended, especially in your example case of Notebook + GPU jobs.

cc @jacobtomlinson @rjzamora @mrocklin @fjetter for discussing on this change.

mrocklin · May 2, 2023, 12:19pm

The code you submit from your client will be deserialized on the scheduler. If deserializing that code requires a GPU then your scheduler should have a GPU. If deserializing your code does not require a GPU then you’re fine. It’s worth noting that your code doesn’t actually have to run, so something like the following works fine:

def train():
    import pytorch
    ... do GPU things

client.submit(train).result()

This function deserializes just fine without a GPU, but doesn’t run without a GPU. It does not require the scheduler to have a GPU.

Does this help?

mrocklin · May 2, 2023, 12:22pm

github.com/dask/dask-blog

Change GPU scheduler article

opened 12:22PM - 02 May 23 UTC

mrocklin

This article https://blog.dask.org/2023/04/14/scheduler-environment-requirements… includes statements like the following: > If you use value-add hardware on the client and workers such as GPUs you’ll need to ensure your scheduler has one This statement can be misleading. It's very true for RAPIDS work, but generally less true for PyTorch or other GPU work. (here is a [pretty typical example](https://github.com/coiled/examples/blob/main/pytorch.ipynb)). I've fielded a bunch of questions on this topic. [Here is an example](https://dask.discourse.group/t/new-requirement-of-shared-hardware-environment-between-scheduler-and-compute-nodes/1817/3). I think that we should alter this blog post to talk more about how serialization works. This article is causing non-trivial confusion among general, non-RAPIDS, GPU users. cc @jacobtomlinson @quasiben @ntabris

mitchellshargreaves · May 3, 2023, 5:47am

Thank you @guillaumeeb @mrocklin for your clarifications.

I’ve updated the dask packages in our environment and can confirm that at this stage we are not impacted by the update.

cybertreiber · July 1, 2023, 6:53am

We are running an always-on dask cluster, as presented at a Dask Summit. Therefore, it would be great to keep our scheduler as lightweight as is.

We put pytorch model futures onto queues in order to cache them. I think this will lead to problems. I’m not sure after reading the article and some discussions on github.
What’s rather clear is, that our scheduler will need django and database dependencies since clients will submit tasks with ORM objects. This is a smaller problem, which I don’t see how to avoid.

The article was improved. However, I’d like to know which exact changes make the scheduler peek into objects and whether this behavior can be avoided in newer versions.

Topic		Replies	Views
Tuning Distributed Dask Clusters with GPUs Distributed dask-gateway , distributed	3	1007	February 21, 2022
Dask-cuda-worker not registering with scheduler Distributed scheduler , worker , gpu	7	1058	August 31, 2022
Tight coupling between workers and schedulers Distributed kubernetes , distributed	2	227	September 28, 2022
Some gpus are idling when running dask-mpi on HPC Distributed dask-mpi	7	428	April 24, 2023
Common workflow for using Dask on HPC systems Distributed distributed	2	91	October 18, 2024

New requirement of shared hardware environment between scheduler and compute nodes

Related topics