Dask on AWS Lambda

We are working on a project that aims at making query engines serverless by running them on AWS Lambda. We have already integrated 7 on them, among which Dask. We publish a benchmark that summarizes the execution times on a very simple query: cloudfuse - Standalone engines. The focus here is more on the setup times (cold and warm) than the query execution itself.

As you can see, Dask ranks last of the non JVM engines. I expected it to be much faster to start up. I would be interested to know if there is something wrong in my setup or if that is just how Dask is expected to perform. The full infra for the benchmark is open source: GitHub - cloudfuse-io/lambdatization: Run query engines in Cloud Functions (docker/dask for the Lambda image sources).

In your case it looks like you are using dask_sql not regular dask to perform the groupby. Let me ping someone that works on dask-sql to see if they have any insights.

1 Like

Is it expected to have a significant performance gap? The biggest surprise is the large setup time, not that much the query performance itself :blush:

Yeah at least from first glance I would assume the majority of the gap between the cold/warm runs is coming from the client/cluster startup (though it is possible that some dask-sql initialization on the first run is eating up some time).

My first observation here is that the cluster startup code seems a little roundabout to create a cluster with no nannies, and I would be interested in if doing something like

CLIENT = Client(processes=False)

instead would narrow the performance gap. I myself haven’t done too much Dask deployment on the cloud, so might be worth pinging someone with more experience in that area :slightly_smiling_face:

The reason for using --no-nanny is because Lambda doesn’t support the multiprocessing library. But I do want to spin up a full cluster: the next step is to run these engines in distributed mode so the KPI for us is the cluster setup time.

But I do want to spin up a full cluster

For clarity, Client(processes=False) is shorthand for:

from distributed import LocalCluster, Client

# spin up a cluster without processes, which disables nannies
cluster = LocalCluster(processes=False)

# connect a client to the created cluster
Client(cluster)

This should achieve the same cluster setup that you’re currently working with but in a more straightforward way that I would hope reduce the cold run times, happy to iterate from there though.

the next step is to run these engines in distributed mode

Not sure where Lambda & Dask stand today relative to 2018, but here is an article outlining some of the limitations of running a distributed Dask cluster on AWS Lambda - I’d be interested to see how your experiences scaling up compare!

1 Like

here is an article outlining some of the limitations of running a distributed Dask cluster on AWS Lambda

Thanks for sharing this! The main limitation pointed out here is the networking limitations, and that is precisely what we are set out to solve. Natively, Lambda functions cannot listen on a publicly addressable socket. As explained in this article, there is a workaround and we are currently exploring it.

Client(processes=False) is shorthand for …

This LocalCluster setup could not scale if I wanted to add another worker node, could it?

As Charles pointed out I explored running Dask on Lambda 5 years ago and ran into the same networking constraints with multi-node setups.

The paper you linked has some interesting ideas around proxying network connections, but I think that’ll become a bottleneck pretty quickly.

Lambda currently supports 6 CPU cores per invokation so using LocalCluster would allow you to utilize all of them.

1 Like

thanks for sharing your experience!

The paper you linked has some interesting ideas around proxying network connections

Actually the authors are not proposing to proxy the connection but to establish peer to peer connections. So the external node only acts as a seeding peer, it isn’t relaying the actual data.

Lambda currently supports 6 CPU cores per invokation so using LocalCluster would allow you to utilize all of them

The problem with that setup is that Dask uses multiprocessing to take advantage of all the cores, and that is not supported on Lambda because /dev/shm is disabled. But even if that worked, our goal is to create transient clusters of hundreds of CPUs so we would need a way to have remote workers.

1 Like

Neat! I’m excited to see what happens here. Establishing peer to peer connections between lamba invocations would be huge!

As we have limited resources to dedicate to this, it should take us at least a few weeks to have a first working prototype. It would sure help us a lot if the code for Boxer, the tool advertised in the paper, was open sourced as promised by the authors :blush: