cluster = SSHCluster(
["192.168.0.1", "192.168.0.17", "192.168.0.13", "192.168.0.15", "192.168.0.9", "192.168.0.11", "192.168.0.5","192.168.0.7", "192.168.0.19"],
worker_options={"local_directory":worker_space_path},
scheduler_options={"dashboard_address": ":8798"})
By default local worker directory in a distributed setup is created in the home directory, but I want to have a different location for each worker, how can I do it using python?
setting worker_options just sets one path and that also throws an error because all workers try to acquire the lock and I get OSError: [Errno 16] Device or resource busy
Or if there is a way to change the local directory after initialization?
When you specify a directory Dask creates a dask-worker-space directory in that space, then each worker makes its own directory there.
When I run the following:
dask-worker localhost:8786 --local-directory tmp/foo
dask-worker localhost:8786 --local-directory tmp/foo
dask-worker localhost:8786 --local-directory tmp/foo
and then look at the contents I see the following:
$ ls tmp/foo/dask-worker-space/
global.lock worker-fdjkmmjb.dirlock worker-zkqhtoet
purge.lock worker-yad8d99n worker-zkqhtoet.dirlock
worker-fdjkmmjb worker-yad8d99n.dirlock
So I think that you should be ok here. Apparently this isn’t working well for you though?
If you want to change the local directory after startup you might consider using a worker plugin. See Customize initialization — Dask documentation
Yes so using the terminal the directories are initialized successfully but I wanted a way to initialize using python. I will check the custom worker plugin.
I also found this: https://docs.dask.org/en/latest/how-to/deploy-dask/python-advanced.html
But with this, when I initialize my dask worker by giving the host IP, it throws an error saying the host IP is already in use(whereas I created the worker from scratch, no other worker was running on the host)