Thanks for the reply ian. Unfortunately I can’t edit my question anymore since I’m a new user. I’ve updated the question on SO with some more details, if you’d care to take a look.
W.r.t. use_private_ip
: this only works when deploying from within e.g. EC2, which would require yet another VM that I don’t necessarily want to use, since I then need to manage another component and get my code on it (or rebuild an image every time) instead of deploying from my local workstation or any local component. If this is best practice, I’ve yet to understand exactly why and would appreciate some further background on this approach…
(clarified info from SO follows):
I have the following workflow where I start a Dask Cloudprovider EC2Cluster
with the following config:
dask_cloudprovider:
region: eu-central-1
bootstrap: True
instance_type: t3.small
n_workers: 1
docker_image: daskdev/dask:2022.10.0-py3.10
security: False
debug: True
use_private_ip: False
# other properties: region, instance_type, security_group, vpc, etc. omitted
This is supposed to start both my scheduler and my workers in docker on EC2 instances in the same VPC. The AWS security group has the following inbound rules:
Port range |
Source |
8786 - 8787 |
[own group] |
8787 |
[my ip] |
So, the workers and scheduler should be able to talk among themselves, and I should also be able to access the Scheduler Bokeh Dashboard from my IP only.
The important part is that the above security group rules only allow communication between the private IPs of the instances in the same security group, and do not allow traffic between the public IPs of the instances.
This is OK, since traffic between public IPs necessarily must be routed through an internet gateway and hence incurs bandwidth costs.
The problem is these network rules do not work as-is. My scheduler starts as follows:
Creating scheduler instance
Created instance i-09fe7442a8026db71 as dask-723b83ae-scheduler
Waiting for scheduler to run at <public scheduler IP>:8786
The IP that dask-cloudprovider
advertises as the IP where the workers should connect is the public scheduler IP. Worker<->Scheduler traffic through public IP is not allowed as per my security group (and I wish to keep it so).
The only way that would allow the workers/schedulers to communicate in this setup that I’ve found is to set use_private_ip: True
, but in that case my EC2Cluster
startup hangs on trying to reach the scheduler at its private EC2 IP, which I obviously can’t access from my own workstation.
I’ve seen that the ‘recommended’ approach is to deploy the EC2Cluster
from yet another VM on e.g. EC2 in the same VPC, but I don’t understand the upside of this when I could simply develop and then start it locally. It also adds another layer where I would have to get my code onto this separate VM (and install all requirements or rebuild an image every time).
More importantly:
I was able to achieve exactly what I want (worker-sched talking through private VPC only, scheduler dashboard accessible from my IP) by monkey-patching dask_cloudprovider.aws.ec2#configure_vm
to return instance['PublicDnsName']
instead of instance['PublicIpAddress']
, since AWS domain names are resolved to the public IP when queried externally and private IP otherwise.
Unfortunately this also means my workers must have public IPs (even though no traffic can be routed there), and is also a super-hacky workaround rather than a robust solution.