Dask SSH cluster - task running and SSH keys update

Hello Team,

Background:

  1. I am using python to create an Dask SSH cluster with workers, and use it.
  2. I have created a try except method: Try for creating the SSH cluster and finally for closing the client and cluster.
  3. I am using passwordless SSH currently

Question:

  1. Does the client update the SSH authentication key or known host file by any chance? I have an SSH authorized key updated and wanted to clear my understanding if there is no update made by dask?

  2. Can there be a chance that the dask cluster will keep running the background? is there a way to find if the cluster is still running after the code finished or failed?

  3. If i donot have a authroized key but i have the host in known_host file, then will i need to provided password in the paramter here? connect_options={"known_hosts": None, "username":<<username>>, "password":<<password>>}

  4. Can i implement a code to understand that which delayed task is executed across which worker node?

Current code:

if __name__ == '__main__':
	try:
		# CLIENT MODE
		# cluster = LocalCluster(dashboard_address=':8787', n_workers=3, threads_per_worker=1, memory_limit='16GB')
		# client = Client(cluster)
		
		# CLUSTER MODE
		cluster = SSHCluster(
			["localhost", "localhost", "localhost", "localhost"],
			connect_options={"known_hosts": None, "username":<<username>>},
			worker_options={"nthreads": 2},
			scheduler_options={"port": 0, "dashboard_address": ":8797"}
			remote_python = ["virtual_env_path/bin/python","virtual_env_path/bin/python"]*3
		)
		client = Client(cluster)
		client.upload_file("test_2.py")
		delayed_func()
	except Exception as e:
			 print(f"Failed - Exception {e}")
	finally:
		if client:
			client.close()
		if cluster:
			cluster.close()

There is no reason to update SSH auth key, but known host can be by SSH itself.

It shouldn’t if closed properly. The best way to know is to check Python process on your nodes.

This might work, but it is not really recommended. It is much better to use authorized keys.

You can probably do that with SchedulerPlugin or WorkerPlugin, but what are you trying to understand?

1 Like

@guillaumeeb I wanted to understand that if my code created 10 delayed task for 10 files, and i have 5 workers with 1 thread.
Can i get to understand that which dask delayed task is executed across which worker?

I think it’s a Scheduler decision based on available workers, you should end up with two tasks per Worker, but I don’t think it is deterministic. You can force some behaviour if you want, but I would leave that to the Scheduler.

1 Like