AWS Lambda Image to run dask-cloudprovider[aws-fargate]

ivojuroro · April 7, 2022, 3:12pm

Hello !

I want to run a AWS Fargate cluster using dask-cloudprovider in a Docker Image to run it in AWS Lambda. I have installed all packages using conda and it works when I’m running it in local. However, I’d like to run it from a AWS Lambda (it is gonna be executed once a day, no need to have a EC2 all the time). I created different Dockerfiles, based on different images.
This last version is one extracted from blog-samples/2021-06-Amazonian-Conda at main · BaysC/blog-samples · GitHub .

FROM public.ecr.aws/lambda/python:3.8 => I install miniconda with all the dependencies. Same problem, I can’t find my function.
ERROR: “[Errno 30] Read-only file system: ‘/var/task/dask-worker-space’”

FROM public.ecr.aws/lambda/python:3.8

RUN yum update && yum install -y wget && yum clean all
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh && sh miniconda.sh -b -p /opt/miniconda
COPY environment.yml /tmp/environment.yml
RUN sed -i -r '/m2w64|vs2015|msys2|win|vc/d' /tmp/environment.yml
RUN /opt/miniconda/bin/conda env create --file /tmp/environment.yml --prefix /opt/conda-env
RUN /opt/conda-env/bin/pip install awslambdaric
RUN mv /var/lang/bin/python3.8 /var/lang/bin/python3.8-orig && ln -sf /opt/conda-env/bin/python /var/lang/bin/python3.8
COPY my_code.py /opt/my-code/my_code.py

ENV AWS_KEY=<KEY> \
    AWS_SECRET=<SECRET>

ENV PYTHONPATH "/var/lang/lib/python3.8/site-packages:/opt/my-code"
ENTRYPOINT ["/lambda-entrypoint.sh"]
CMD ["my_code.lambda_handler"]

So, I changed to to install/copy everything in /tmp since AWS Lambda suggests it. Now the error is: “RequestId: 18… Error: Runtime exited with error: exit status 127” => /var/runtime/bootstrap: line 7: /var/lang/bin/python3.8: No such file or directory

And I have this

FROM public.ecr.aws/lambda/python:3.8

RUN yum update && yum install -y wget && yum clean all
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh && sh miniconda.sh -b -p /opt/miniconda
COPY environment.yml /tmp/environment.yml
RUN sed -i -r '/m2w64|vs2015|msys2|win|vc/d' /tmp/environment.yml
RUN /opt/miniconda/bin/conda env create --file /tmp/environment.yml --prefix /tmp/opt/conda-env
RUN /tmp/opt/conda-env/bin/pip install awslambdaric
RUN mv /var/lang/bin/python3.8 /var/lang/bin/python3.8-orig && ln -sf /tmp/opt/conda-env/bin/python /var/lang/bin/python3.8
COPY my_code.py /tmp/opt/my-code/my_code.py

ENV AWS_KEY=<KEY> \
    AWS_SECRET=<SECRET>

ENV PYTHONPATH "/var/lang/lib/python3.8/site-packages:/tmp/opt/my-code"
ENTRYPOINT ["/lambda-entrypoint.sh"]
CMD ["my_code.lambda_handler"]

If I run it with a different endpoint just to check the error, and run /var/lang/bin/python3.8 the file is actually there so I don’t understand what’s going on.

PD: The my_code.py file creates a AWS Fargate cluster like:

cluster = FargateCluster(
            scheduler_mem=8192, n_workers=12, worker_cpu=256, worker_mem=1024, 
            aws_access_key_id=AWS_KEY,
            aws_secret_access_key=AWS_SECRET,
            image="<dependencies_image>",
            cloudwatch_logs_group="<cloudwatch_group>"
           )

Please let me know if anyone has an idea of what may be going on ! Thanks in advance !

ivojuroro · April 19, 2022, 8:50am

UPDATE

Finally the Dockerfile is working, it creates the cluster but when waiting for the reponse, as AWS Lambda execution environment not having /dev/shm (shared memory for processes) support I get
sl = self._semlock = _multiprocessing.SemLock( future: <Task finished name='Task-2814' coro=<_wrap_awaitable() done, defined at /opt/conda-env/lib/python3.8/asyncio/tasks.py:688> exception=OSError(38, 'Function not implemented')>
I know dask uses pools to multiprocess, but is there any way to use multiprocessing.Pipe as exaplained in Parallel Processing in Python with AWS Lambda | AWS Compute Blog ? Thanks in advance !

0ren · April 19, 2022, 5:21pm

Hey! Nice to virtually meet you.

Glad you finally got that Dockerfile working. I’ve seen that it’s tricky to run Dask on AWS Lambda in general, see the heading “Reality” in this blog post by @jacobtomlinson : Exploring Dask and Distributed on AWS Lambda | by Jacob Tomlinson | Met Office Informatics Lab | Medium

About using multiprocessing.Pipe , you can write functions that use it, but I don’t think Dask can directly use it. I’ll have to look into it more. But, that said, to get rid of the error, you can try passing processes=False to the Client. Ref: dask worker: daemonic processes are not allowed to have children · Issue #2142 · dask/distributed · GitHub

Are you free sometime this week to go over your workflow a bit more? Sometimes it’s easier to have a quick convo to work on diagnosing why you’re getting the “No such file or directory” error and all that jazz.

If you’re interested, here’s a link to my calendar: Meetings What time works best for you?

ivojuroro · April 20, 2022, 8:42am

Hello ! Nice to virtually meet you too. Thanks for your answer, it definitively helped my getting rid of that error. However, seems like there are some errors that come up like TypeError: ‘Serialize’ object is not subscriptable when reading parquet dataset with Client(processes=False). Do you have any idea on this one?

Sure ! I’ll schedule it, thanks !

0ren · April 20, 2022, 2:42pm

Investigating now! And looking forward to talking then

cyhsu · July 28, 2022, 10:21pm

Hi @ivojuroro @0ren
Just curious whether you guys have any updates about the Dask mission on Lambda?

ivojuroro · August 6, 2022, 7:36pm

Hi cyhsu ! Finally I stopped working on this exact use case. However, you can run it on coiled. Also, using AWS the best way to deploy was by using AWS Batch.

rdettai · January 6, 2023, 8:24am

Hi to you all!

We are working on a project that aims at running multiple query engines on AWS Lambda: GitHub - cloudfuse-io/lambdatization: Run query engines in Cloud Functions.

Among others, we have come up with a Dask Docker image that works fine on AWS Lambda: lambdatization/docker/dask at main · cloudfuse-io/lambdatization · GitHub

Topic		Replies	Views
Basic question about Dask AWS cloudprovider and scheduled/routine processing dask-cloudprovider	2	95	April 3, 2024
Starting EC2Cluster with dask_cloudprovider Distributed	2	351	April 19, 2023
How do I configure drivers on my workers Distributed distributed , dask-cloudprovider	1	10	August 15, 2024
Install dependencies on EC2Cluster Distributed	3	39	October 24, 2024
Running dask from k8s cluster Distributed kubernetes , distributed	2	722	May 20, 2022

AWS Lambda Image to run dask-cloudprovider[aws-fargate]

UPDATE

Related topics