Dear all,
I’ve been reading up the documentation and tried many things, but cannot make it work.
An oversimplified view of my problem can be seen here:
Generally, workers have only a private IP, WAN access and are on a separate network to the scheduler. They are spawned by a separate process, let’s ignore that for now.
The scheduler lives on a node with public IP and one open port to the outside world.
Looking at the Journey of a task, everything works well until Step7: Gather. Since the scheduler has no way to contact the workers directly, this step fails.
I’ve been looking at dask-gateway, and running multiple schedulers (one per cluster) but AFAIK, this also requires connectivity from gateway to scheduler, doesn’t it?
Finally, I had a look at plugins and came across the RabbitMQ example (distributed.dask.org/en/stable/plugins.html#rabbitmq-example
).
Having a message queue on the open port could solve the issue if the scheduler could (for the gather step)
- leave a message for the workers
- the worker that has the result picks the msg up and pushes the result
- the scheduler picks up the result from the msg queue
Before I go down that Rabbit hole, I wanted to make sure this is not a solved problem.
I might be simply missing the right keywords to look for or someone solved the problem in a different way.
I know of github.com/comp-dev-cms-ita/dask-remote-jobqueue
, but would prefer not getting SSH involved.
Apologies for the improperly formatted links, but new users are only allowed 2