Hello everyone,
I am currently using AWS CDK to deploy a dask cluster on ECS, I have three services:
- Scheduler Service - EC2 backend
- WorkerEC2 Service - EC2 backend
- WorkerFargate Service - Fargate backend
The Scheduler and WorkerEC2 service are running constantly with a single task. This is to ensure some availability. Then the WorkerFargate service has 0 running tasks. The goal of this service is to scale out workers as needed.
I would like to use the ECSCluster class in my application to take advantage of the ECSCluster.adapt() function. I see that there is a way to specify the worker arn, however is there a way to specify that I want the workers to be spawned into the WorkerFargate service and therefore only perform the scaling there? If not, then I may just extract the scaling/adaptive logic from the source code and implement a solution using that.
Thanks for any advice and insight!
Hi @cbritogonzalez, welcome to Dask community!
First, you built your three services by hand? Why don’t you want to use also EC2Cluster for the first two services?
In any case, ECSCluster or FargateCluster have a scheduler_address kwarg. If you connect an instance of FooCluster to your existing Scheduler and use adapt, it should scale only on the resources pointed by this cluster type.
cc @jacobtomlinson for confirmation.
Generally the dask-cloudprovider package is designed to manage the whole cluster. The kind of setup you are trying to do wher eyou are managing some of the resources means you will probably run into a bunch of edge-case issues we haven’t accounted for.
If you want to manage things yourself instead of relying on dask-cloudprovider then I would recommend you just take full control and handle everything yourself including adaptive scaling.
Hey!, thanks for the replies, I actually got the cluster working by using the ECSCluster from the dask-cloudprovider like @jacobtomlinson mentioned. I did manage to maintain my deployment on CDK and just reference the scheduler address. However, I did have a bit of trouble since some of the documentation was conflicting IMO (or I just misunderstood). This is what I ended up needing to specify to get things up and running (using dask version: 2025.01.0):
ECSCluster(
region_name="ecs-cluster-region",
scheduler_address="scheduler-address",
scheduler_task_definition_arn="scheduler_task_definition_arn",
worker_task_definition_arn="worker_task_definition_arn",
fargate_workers=True,
fargate_use_private_ip=True,
worker_nthreads=1,
worker_mem=1,
cluster_arn="cluster-arn",
execution_role_arn=execution_role_arn,
task_role_arn=task_role_arn,
cloudwatch_logs_group="your-log-group",
security_groups=security_group_ids,
vpc=vpc_id,
subnets=subnet_ids,
skip_cleanup=True,
worker_extra_args=[
"--worker-port",
"9000",
"--nanny-port",
"9001",
"--nworkers",
"1",
"--no-dashboard",
],
)
Now the tasks just spawn into the cluster and not into the service but that is fine, the implementation works well. Thanks for the comments and feedback @guillaumeeb and @jacobtomlinson!
1 Like