Clarification sought on local scheduler and remote worker set up though SSHCluster

I am targeting to run local scheduler and remote cluster worker, they are connected through VPN, only port 22 is open

Terminal 1: At local computer, for scheduler
Starts the Dask scheduler on local machine listening on port 8786.
python -m dask.distributed.cli.dask_scheduler --port 8786

Terminal shows

Scheduler at:     tcp://10.2.0.103:8786
dashboard at:  http://10.2.0.103:8787/status
Registering Worker plugin shuffle

Terminal 2: Open a new terminal, SSH into
ssh usrename@machine_adress

python -m dask.distributed.cli.dask_worker tcp://localhost:8786 \
    --name=cluster-worker \
    --listen-address tcp://0.0.0.0:45678 \
    --contact-address tcp://localhost:45678 \

Specifies that the worker will listen on port 45678 on all network interfaces.
Tells the worker to advertise its address as localhost:45678

Terminal shows

Start worker at:      tcp://localhost:45678
Listening to:      tcp://localhost:45678
Worker name:               cluster-worker
dashboard at:            localhost:34607
Waiting to connect to:       tcp://localhost:8786

Terminal 3: Open a 3rd terminal to set up the SSH tunnel, it will log into the cluster
ssh -L 45678:localhost:45678 -R 8786:localhost:8786 -R 45678:localhost:45678 usrename@machine_adress

-L 45678:localhost:45678 (Local Port Forwarding):
Forward any connection to local port 45678 to port 45678 on the remote side
-R 45678:localhost:45678 (Remote Port Forwarding)
Forward any connection to port 45678 on the remote machine to port 45678 on local host

Terminal 4: run the py file

futures = client.map(square, range(10))
results = client.gather(futures)

In the current set up, this would be a successful run. But there are so many terminals needed to be opened.

One noticeable sequence here is: if I set up SSH channel first then start cluster worker, it would cause error as port 45678 is occupied.

  1. Is it a must to start local scheduler and remote cluster worker first, before setting up SSH tunnel, to make the process run through?

  2. My goal is to set it up through SSHCluster, what is the proper way to achieve that?

Thank you for your help in advance!

Hi @liuzongyue6, welcome to Dask community!

Not an expert, but as you experienced, I think you must have a server running first…

You can probably user your localhost as the first value in the hosts kwarg. You should also be able to use the connect_options and worker_options to specify the ports. Maybe SSH options, but if point 1 above is valid, that will probably not work.

You’ll probably have to set-up tunneling after the SSHCluster creation.