This is related to Workers on private network, scheduler on a different network - how to make “gather” step work?, but I’d like to explore a potential solution rather than necro-bumping an old thread.
I have a similar setup with a centralized scheduler running in a cloud provider, and a user-distributed script to run workers at will on arbitrary private computers.I’ve solved for the cloud ingress/egress, so workers are able to successfully connect to the scheduler to self identify and we are able to see the heartbeat calls inside of the scheduler.
When a client connects and submits a job the workers are able to receive and perform the tasks successfully and I see the debug logs in the scheduler indicating the task completion. The problem is again at gather()
… The scheduler seems to always attempt to initiate a new TCP connect to the worker address, which gets blocked by the private workers’ various and untouchable-for-our-purposes firewalls.
My question is this - is there any way to persist a TCP or WS connection initiated by the worker, and leverage that communication channel to gather the results? This is potentially a deal-breaker for us if we can’t solve the gather from the centralized scheduler.