I would like to streamline deploying dask on the following infrastructure. My users have access to a jupyterhub and an HPC cluster. The HPC cluster is only visible from the jupyterhub via an SSH proxy.
I would like the Cluster
instances to be available on the jupyterhub, the Scheduler
to run on the cluster head node, and the workers to be launched as PBSJob
. In other works, I am searching for a combination between a dask.distributed.deploy.ssh.SSHCluster
and dask_jobqueue.pbs.PBSCluster
. Is this possible? I’ve browsed through the source, and it seems that both SSHCluster
and PBSCluster
are flexible enough to accommodate this, but I am not sure whether this is a false impression.
Additionally, when experimenting I noticed connection problems of SSHCluster
if I try to launch it on the head node of the HPC cluster. Could that be due to the need to connect via an SSH proxy and the ports being closed otherwise? If that is correct, is there a way to configure tunneling dask tcp traffic over ssh?
Hi @akhmerov,
Could you clarify? You can submit jobs from the Jupyter notebooks, but only SSH port is open between notebooks and cluster nodes? So you cannot have the scheduler running in the notebook process?
I don’t think this is possible only by configuration or tweaks. You’ll need to rewrite something, either a custom SSHCluster, or using SpecCluster with custom Python functions, but that doesn’t sound easy.
I guess this will be the hard part: your Client always need to be in the Jupyter notebook process, and so should be able to connect to the Scheduler which is running on the head node.
Indeed, I can only access the SSH port on the cluster, and even that via an SSH proxy.
I guess this will be the hard part: your Client always need to be in the Jupyter notebook process, and so should be able to connect to the Scheduler which is running on the head node.
Right now we solve this by starting a dask-jobqueue cluster on the cluster together with forwarding the scheduler and dashboard ports via SSH (an asyncssh tunnel more specifically). This works, but we don’t have access to the cluster methods.
I’d like to check that I understand correctly: does SSHCluster
assume that it is possible to establish direct TCP connections to the scheduler and the workers? I didn’t seem to find anything related to port forwarding in the source code.
Well, at least to the Scheduler yes. So that you can connect to it from where you launched the cluster.
In your case, you’ll need a Client and Cluster object in the notebook, which should access a Scheduler object on the head node, I assume to be able to call cluster.scale
methods or alikes. I really thinks this will need some custom code.
cc @jacobtomlinson
1 Like