Hi,
Assuming the desire to perform routine processing using Fargate each month and the desire to use Dask CloudProvider AWS API and specifically the FargateCluster, what is the best method of scheduling the cluster deployment and tasks? Currently the cluster and client are created through a Python script on my local machine. The tasks would end up writing processed data to S3 bucket.
For example, do I need to create a lambda function that launches an ec2 to run the code to create the FargateCluster and/or client-- an operation which is currently done on my local machine? However, the idea of that seems counterintuitive to the benefits of fargate which rest in the serverless approach.
A very basic example:
from distributed import Client
from dask_cloudprovider.aws import FargateCluster
cluster = FargateCluster(
region_name=‘us-east-1’,
aws_access_key_id=“xxx”,
aws_secret_access_key=“xxx”,
image = “daskdev/dask:2024.3.1-py3.12”,
n_workers=1)
cluster.adapt(minimum=0, maximum=1)
#Could also be specified in docker container
def do_work(n):
return n + 1
client = Client(cluster)
rs = client.gather(client.map(do_work, [ 1 ]))
print(rs)
client.close()
cluster.close()