Jobqueue over ssh

akhmerov · December 23, 2024, 9:27pm

I would like to streamline deploying dask on the following infrastructure. My users have access to a jupyterhub and an HPC cluster. The HPC cluster is only visible from the jupyterhub via an SSH proxy.

I would like the Cluster instances to be available on the jupyterhub, the Scheduler to run on the cluster head node, and the workers to be launched as PBSJob. In other works, I am searching for a combination between a dask.distributed.deploy.ssh.SSHCluster and dask_jobqueue.pbs.PBSCluster. Is this possible? I’ve browsed through the source, and it seems that both SSHCluster and PBSCluster are flexible enough to accommodate this, but I am not sure whether this is a false impression.

Additionally, when experimenting I noticed connection problems of SSHCluster if I try to launch it on the head node of the HPC cluster. Could that be due to the need to connect via an SSH proxy and the ports being closed otherwise? If that is correct, is there a way to configure tunneling dask tcp traffic over ssh?

guillaumeeb · December 29, 2024, 2:22pm

Hi @akhmerov,

Could you clarify? You can submit jobs from the Jupyter notebooks, but only SSH port is open between notebooks and cluster nodes? So you cannot have the scheduler running in the notebook process?

I don’t think this is possible only by configuration or tweaks. You’ll need to rewrite something, either a custom SSHCluster, or using SpecCluster with custom Python functions, but that doesn’t sound easy.

I guess this will be the hard part: your Client always need to be in the Jupyter notebook process, and so should be able to connect to the Scheduler which is running on the head node.

akhmerov · December 29, 2024, 3:25pm

Indeed, I can only access the SSH port on the cluster, and even that via an SSH proxy.

I guess this will be the hard part: your Client always need to be in the Jupyter notebook process, and so should be able to connect to the Scheduler which is running on the head node.

Right now we solve this by starting a dask-jobqueue cluster on the cluster together with forwarding the scheduler and dashboard ports via SSH (an asyncssh tunnel more specifically). This works, but we don’t have access to the cluster methods.

I’d like to check that I understand correctly: does SSHCluster assume that it is possible to establish direct TCP connections to the scheduler and the workers? I didn’t seem to find anything related to port forwarding in the source code.

guillaumeeb · January 3, 2025, 2:28pm

Well, at least to the Scheduler yes. So that you can connect to it from where you launched the cluster.

In your case, you’ll need a Client and Cluster object in the notebook, which should access a Scheduler object on the head node, I assume to be able to call cluster.scale methods or alikes. I really thinks this will need some custom code.

cc @jacobtomlinson

Topic		Replies	Views
Connect remote scheduler to dask_jobqueue.SLURMCluster Distributed	5	1101	January 4, 2022
Dask SSH Cluster setup Distributed distributed	10	643	May 9, 2024
Clarification sought on local scheduler and remote worker set up though SSHCluster Distributed distributed	1	30	April 18, 2025
Alternatives to dashboard link via Jupyter notebook on a HPC cluster Distributed distributed	3	294	April 23, 2022
No jobs sent to workers Distributed delayed , distributed , dask-ml	2	237	February 22, 2024

Jobqueue over ssh

Related topics