GPCCluster scaling number of VMs

Hello,

I have read through the Dask Cloud Provider docs and tried to understand how to influence or control the number of VMs the cluster spins up or down.

GPCCluster.scale(n_workers, mem, cores)

seems to be the right place.

My intended usage pattern is like this:

with GCPCluster(…) as cluster:
cluster.scale(…)
with Client(cluster) as client:
futures = client.map(…)

But how does Dask translate the number of desired workers or cores to the number of VMs it should spin-up (or down)?

When I’m creating the cluster I specify a gcloud machine type which translates to some amount of available vCPU cores per VM instance.
But Dask can’t know the cpu core capacity of the VM until it has started at least one VM, because it has to query that from the running VM instance, right?

Let’s assume I use a machine that provides 2 vCPUs.

When I now call cluster.scale(n=4, cores=4) do I then get two VMs with 2 workers on each?

And calling cluster.scale(n=8, cores=4) two VMs with 4 workers on each?
The latter might be useful if my worker processes are blocked by network I/O.

Thanks in advance for giving me some background!

Best regards,

Bernd

By default we assume one worker == one node. Dask workers can have many threads and therefore process many tasks at the same time.

You can also tweak your configuration to launch many processes per node. FOr example if you set 8 processes per node and set n_workers to 5 you will see 40 workers in your Dask dashboard.

Hello Jacob,

I understood that multithreading Python generally works not so great due to the GIL restrictions.

Multithreading might help if my tasks are I/O heavy and often wait for responses.
But if the tasks is heavy on compute, I’ll probably only get the most from my CPUs when I do multiprocessing. It really depends on my workload, I guess.

For the sake of simplicity let’s assume that my task doesn’t free GIL, so that one worker process per CPU would be using the available CPU resources best.

But, back to the core of my question:
What’s going to happen if I call cluster.scale(n= 8 cores=4)?
Will it always spin up 4 VMs, regardless how many vCPUs each VM has?

If you call cluster.scale(8) it will add 8 VMs to your cluster. The number of cores depends on which VM type you chose when creating the cluster.

1 Like