Basic question about Dask AWS cloudprovider and scheduled/routine processing

beder101 · March 23, 2024, 7:14am

Hi,

Assuming the desire to perform routine processing using Fargate each month and the desire to use Dask CloudProvider AWS API and specifically the FargateCluster, what is the best method of scheduling the cluster deployment and tasks? Currently the cluster and client are created through a Python script on my local machine. The tasks would end up writing processed data to S3 bucket.

For example, do I need to create a lambda function that launches an ec2 to run the code to create the FargateCluster and/or client-- an operation which is currently done on my local machine? However, the idea of that seems counterintuitive to the benefits of fargate which rest in the serverless approach.

A very basic example:

from distributed import Client
from dask_cloudprovider.aws import FargateCluster

cluster = FargateCluster(
region_name=‘us-east-1’,
aws_access_key_id=“xxx”,
aws_secret_access_key=“xxx”,
image = “daskdev/dask:2024.3.1-py3.12”,
n_workers=1)

cluster.adapt(minimum=0, maximum=1)

#Could also be specified in docker container
def do_work(n):
return n + 1

client = Client(cluster)

rs = client.gather(client.map(do_work, [ 1 ]))

print(rs)

client.close()
cluster.close()

guillaumeeb · March 24, 2024, 5:30pm

Hi @beder101, welcome to Dask Discourse forum,

I’m under the impression that your question is more about AWS than Dask. In short, you want to implement a CRON based approach that will launch your script. If Lambda alone is not enough (you’re code run for more than 15 minutes?), then you can probably find other approaches, but it would be better to ask on an AWS forum.

I’ve google a bit, and found a few resources:

I’m not sure how relevant they are. Maybe others here like @jacobtomlinson have more experience than me on this subject.

jacobtomlinson · April 3, 2024, 9:37am

You might also be interested in some kind of workflow manager like Prefect.

Topic		Replies	Views
AWS Lambda Image to run dask-cloudprovider[aws-fargate] Distributed dask-cloudprovider	7	1010	January 6, 2023
Starting EC2Cluster with dask_cloudprovider Distributed	2	346	April 19, 2023
Portable Workflows: Specifying the cluster class via config Distributed	1	204	September 7, 2022
How is job submission to FargateClusters controlled? Deploying Dask	1	234	May 11, 2022
Which client is being used? Dask Array	3	243	January 14, 2022

Basic question about Dask AWS cloudprovider and scheduled/routine processing

Related topics