Consider the below in main.py:
cluster = SSHCluster(["localhost", "localhost"],
connect_options={"known_hosts": None},
worker_options={"n_workers": 6], },
scheduler_options={"port": 0, "dashboard_address": ":8797"},)
client = Client(cluster)
and another module test_dask.py, which prepares a params array with different parameters for each worker, and passes them to dask.delayed as below (worker_method exists in the same module):
delayed_results = [dask.delayed(worker_method)(dask_params[i]) for i in range(6)]
computed_results = dask.compute(*delayed_results)
The code above, fails with the below error:
No module named test_dask.py
This seems to be well reported. However, the only thing that worked for me was calling the below for every module required by test_dask.py as well as test_dask.py itself, right after instantiating the “client” object.
client.upload_file('simulator/test_dask.py')
Uploading all the required modules to every worker that is started on a remote node seems like an overkill. I would rather make sure that the modules exist on the remote nodes and somehow instruct the worker nodes to point to them (as well as the same environment in general).
Furthermore, as one can see above, I am not even using a remote node yet (both scheduler and workers are on localhost). I have created a virtual environment in the root of my project. Consider the below structure:
mtdcabm
- bin
- lib
- lib64
- simulator
- main.py
- test_dask.py
- other_modules.py
The interpreter being used on the client / scheduler is bin/Python3.11.
How can I dynamically instruct the workers to point to the specific Python interpreter and virtual environment that I want and not have to rely on the upload_file method to upload specific modules remotely?
I am using Python 3.11.2 on Ubuntu 22.04.3.