I am targeting to run local scheduler and remote cluster worker, they are connected through VPN, only port 22 is open
Terminal 1: At local computer, for scheduler
Starts the Dask scheduler on local machine listening on port 8786.
python -m dask.distributed.cli.dask_scheduler --port 8786
Terminal shows
Scheduler at: tcp://10.2.0.103:8786
dashboard at: http://10.2.0.103:8787/status
Registering Worker plugin shuffle
Terminal 2: Open a new terminal, SSH into
ssh usrename@machine_adress
python -m dask.distributed.cli.dask_worker tcp://localhost:8786 \
--name=cluster-worker \
--listen-address tcp://0.0.0.0:45678 \
--contact-address tcp://localhost:45678 \
Specifies that the worker will listen on port 45678 on all network interfaces.
Tells the worker to advertise its address as localhost:45678
Terminal shows
Start worker at: tcp://localhost:45678
Listening to: tcp://localhost:45678
Worker name: cluster-worker
dashboard at: localhost:34607
Waiting to connect to: tcp://localhost:8786
Terminal 3: Open a 3rd terminal to set up the SSH tunnel, it will log into the cluster
ssh -L 45678:localhost:45678 -R 8786:localhost:8786 -R 45678:localhost:45678 usrename@machine_adress
-L 45678:localhost:45678 (Local Port Forwarding):
Forward any connection to local port 45678 to port 45678 on the remote side
-R 45678:localhost:45678 (Remote Port Forwarding)
Forward any connection to port 45678 on the remote machine to port 45678 on local host
Terminal 4: run the py file
futures = client.map(square, range(10))
results = client.gather(futures)
In the current set up, this would be a successful run. But there are so many terminals needed to be opened.
One noticeable sequence here is: if I set up SSH channel first then start cluster worker, it would cause error as port 45678 is occupied.
-
Is it a must to start local scheduler and remote cluster worker first, before setting up SSH tunnel, to make the process run through?
-
My goal is to set it up through SSHCluster, what is the proper way to achieve that?
Thank you for your help in advance!