Gather via persistent worker connection from scheduler?

This is related to Workers on private network, scheduler on a different network - how to make “gather” step work?, but I’d like to explore a potential solution rather than necro-bumping an old thread.

I have a similar setup with a centralized scheduler running in a cloud provider, and a user-distributed script to run workers at will on arbitrary private computers.I’ve solved for the cloud ingress/egress, so workers are able to successfully connect to the scheduler to self identify and we are able to see the heartbeat calls inside of the scheduler.
When a client connects and submits a job the workers are able to receive and perform the tasks successfully and I see the debug logs in the scheduler indicating the task completion. The problem is again at gather()… The scheduler seems to always attempt to initiate a new TCP connect to the worker address, which gets blocked by the private workers’ various and untouchable-for-our-purposes firewalls.

My question is this - is there any way to persist a TCP or WS connection initiated by the worker, and leverage that communication channel to gather the results? This is potentially a deal-breaker for us if we can’t solve the gather from the centralized scheduler.

Hi @RaiinmakerWes, welcome to Dask Discourse forum!

As mentioned by @crusaderky in the post you mentioned, this does not seem to be how Dask Communications are working. But maybe there is more to it?

Yeah it seems like that sort of persistent connection is an anti-pattern and I stopped effort chasing that down…

I’m now working with ngrok TCP Endpoints to create tcp tunnels. Running into mismatches with the worker contact-address provided vs the address used to create the new TCP connection from the scheduler in the gather step. But I may start a new topic for that since it isn’t directly related to this thread.

Got it working :slight_smile: I wasn’t hooking up contact-address and listen-address properly with the ngrok TCP tunnel.

1 Like

Nice, it would be good if you could share your solution!

1 Like

For sure!
I am working on productizing the final script, then I will absolutely make a follow up post to show a minimally reproducible solution :slight_smile:

1 Like