SSHCluster - Start a different number of workers on each host, programmatically

The worker_options param of SSHCluster includes a property “n_workers”, however, when set, this starts the same number of workers on each host. This assumes that all the nodes within the cluster share the same resources, which is not always the case.

I envisage using a custom config file that instructs Dask to start a different number of workers on each node (and potentially other settings that would also be different for different hosts). I looked at the documentation, but could not find a way. Is there a programmatic way to do this?

I don’t think this is possible on the current SSHCluster implementation.

The best workaround I can think of, if your nodes ressources are not too different, is to just put several times the same host in the hosts list according to its resources.

But if you look a the SSHCluster source code, I think it wouldn’t be hard to craft a Pull Request to add the possibility to give worker_options as a list, specifying a different set of options for each host.

@guillaumeeb is absolutely right, we support passing a list for many things in SSHCluster, but if it’s missing for worker_options then please open an issue on GitHub. Contributions are very welcome too!

1 Like

hi guys, thank you both for clarifying. @guillaumeeb 's approach of including the host several times in the hosts list, seems to work. What difference does it make if the resources in the nodes are different? Am I right to say that as long as we don’t need different specifications for each worker process, such as “nthreads” or “memory_limit”, setting “n_workers” as 1 and including the IP address of the node for as many times as we need worker processes (in the hosts list), would be enough? Dask seems to take care of the rest.

Precisely! If your nodes have the same ratio of memory per core, then in most case this should be sufficient!