How to pass private IP of Scheduler to Dask Worker running on EC2?

filpano · October 31, 2022, 2:17pm

I want to set up the following:

Dask Scheduler and Workers all running in EC2.
Scheduler and Worker instances are in the same VPC and with a security group allowing traffic between them (on private IPs).
Scheduler Dashboard listening on the EC2 public IP so that I may access it from e.g. my company network.

Unfortunately, my scheduler and worker instances cannot communicate because they are using the public IPs instead of their private IPs.

When I run the Dask Cloudprovider (via EC2Cluster), I see a scheduler instance being created:

Creating scheduler instance
Created instance i-09fe7442a8026db71 as dask-723b83ae-scheduler
Waiting for scheduler to run at <public scheduler IP>:8786

this public IP is then passed on to the Workers, which is not reachable since my security group settings only allow communication via their private IPs. How do I get the EC2Cluster to pass the private IP instead?

Please note that I do not use fixed IPs for any of these instances.

filpano · November 2, 2022, 2:20pm

For completeness’ sake: I’ve posted this question on StackOverflow as well, and I’ll update either when/if I learn more…

ian · November 2, 2022, 3:01pm

I don’t have a lot of experience with dask-cloudprovider, but I see there is a use_private_ip configuration option. Have you tried that?

filpano · November 2, 2022, 3:49pm

Thanks for the reply ian. Unfortunately I can’t edit my question anymore since I’m a new user. I’ve updated the question on SO with some more details, if you’d care to take a look.

W.r.t. use_private_ip: this only works when deploying from within e.g. EC2, which would require yet another VM that I don’t necessarily want to use, since I then need to manage another component and get my code on it (or rebuild an image every time) instead of deploying from my local workstation or any local component. If this is best practice, I’ve yet to understand exactly why and would appreciate some further background on this approach…

(clarified info from SO follows):
I have the following workflow where I start a Dask Cloudprovider EC2Cluster with the following config:

dask_cloudprovider:
  region: eu-central-1
  bootstrap: True
  instance_type: t3.small
  n_workers: 1
  docker_image: daskdev/dask:2022.10.0-py3.10
  security: False
  debug: True
  use_private_ip: False
  # other properties: region, instance_type, security_group, vpc, etc. omitted

This is supposed to start both my scheduler and my workers in docker on EC2 instances in the same VPC. The AWS security group has the following inbound rules:

Port range	Source
8786 - 8787	[own group]
8787	[my ip]

So, the workers and scheduler should be able to talk among themselves, and I should also be able to access the Scheduler Bokeh Dashboard from my IP only.

The important part is that the above security group rules only allow communication between the private IPs of the instances in the same security group, and do not allow traffic between the public IPs of the instances.

This is OK, since traffic between public IPs necessarily must be routed through an internet gateway and hence incurs bandwidth costs.

The problem is these network rules do not work as-is. My scheduler starts as follows:

Creating scheduler instance
Created instance i-09fe7442a8026db71 as dask-723b83ae-scheduler
Waiting for scheduler to run at <public scheduler IP>:8786

The IP that dask-cloudprovider advertises as the IP where the workers should connect is the public scheduler IP. Worker<->Scheduler traffic through public IP is not allowed as per my security group (and I wish to keep it so).

The only way that would allow the workers/schedulers to communicate in this setup that I’ve found is to set use_private_ip: True, but in that case my EC2Cluster startup hangs on trying to reach the scheduler at its private EC2 IP, which I obviously can’t access from my own workstation.

I’ve seen that the ‘recommended’ approach is to deploy the EC2Cluster from yet another VM on e.g. EC2 in the same VPC, but I don’t understand the upside of this when I could simply develop and then start it locally. It also adds another layer where I would have to get my code onto this separate VM (and install all requirements or rebuild an image every time).

More importantly:
I was able to achieve exactly what I want (worker-sched talking through private VPC only, scheduler dashboard accessible from my IP) by monkey-patching dask_cloudprovider.aws.ec2#configure_vm to return instance['PublicDnsName'] instead of instance['PublicIpAddress'], since AWS domain names are resolved to the public IP when queried externally and private IP otherwise.

Unfortunately this also means my workers must have public IPs (even though no traffic can be routed there), and is also a super-hacky workaround rather than a robust solution.

jacobtomlinson · November 3, 2022, 10:32am

Sorry you’re having a frustrating time with this.

As you say when using private networking you need to run your code from another VM. This is the workaround I would suggest.

Perhaps we also need to tweak things so you can make the scheduler public but the workers private. I think we already do this in the ECS implementation. If this sounds like a good option do you mind raising an issue about it?

filpano · November 4, 2022, 4:53pm

Thanks for chiming in.

Sorry you’re having a frustrating time with this.

Not at all your fault, I’m new to Dask so I’d be surprised if there wasn’t a dash of user error in my problem.

Perhaps we also need to tweak things so you can make the scheduler public but the workers private. I think we already do this in the ECS implementation.

That sounds like a good solution for my case, especially if the ECS implementation already supports it.

If this sounds like a good option do you mind raising an issue about it?

It does, so I’ve gone ahead and created Add the ability to configure EC2Cluster to start a publicly accessible Scheduler while keeping Workers private · Issue #388 · dask/dask-cloudprovider · GitHub.
I’ll mark this as solved for now. Thanks!

Topic		Replies	Views
Dask EC2 in Private Subnet Distributed distributed	2	283	May 31, 2022
Starting EC2Cluster with dask_cloudprovider Distributed	2	349	April 19, 2023
Workers on private network, scheduler on a different network - how to make "gather" step work? Deploying Dask	4	275	June 21, 2023
How is job submission to FargateClusters controlled? Deploying Dask	1	234	May 11, 2022
Multiple workers per node using EC2Cluster Deploying Dask dask-cloudprovider	7	192	February 27, 2024

How to pass private IP of Scheduler to Dask Worker running on EC2?

Related topics