Multiple workers per node using EC2Cluster

Hi there,

I’m using dask_cloudprovider.aws.EC2Cluster to create and connect to a remote AWS cluster.
I need each cluster node to include as many workers as it has vCPUs, and each worker to have only one thread. I have the same behavior in LocalCluster using the “threads_per_worker” option, but I haven’t found any ways to do this in EC2Cluster.

I also tried using “nthreads” for “worker_options” but this way I would only have one worker per node:

cluster = EC2Cluster(
    region = ***,
    availability_zone = ***,
    ami = ***,
    instance_type = ***,
    vpc = ***,
    subnet_id = ***,
    filesystem_size = ***,
    key_name = ***,
    n_workers = ***,
    docker_image = ***,
    debug = ***,
    security = ***,
    worker_options = {
        "nthreads": 1,
    }
)

Could you help me with this?

I think you should be able to set "nthreads": 0 which will autodetect the number of vCPUs.

@jacobtomlinson thanks for your reply.

No, this way I will only have one worker per node.

Let me explain with an example:
I have a cluster with 8 vCPUs per node. I want to have 8 workers per node, each with only one thread.

Ah I see, you want to set n processes instead of n threads. I guess because you are using some non-threadsafe code?

In that case can you set {"nthreads": 1, "nworkers": "auto"} instead?

@jacobtomlinson

Unfortunately, this doesn’t work for me. The scheduler node initialized successfully, but workers did not. I also don’t see any logs to share with you. Could you please check it?

Please set debug=True to leave the VMs running after they fail to start and then SSH to a worker and check the /var/log/cloud-init-output.log file.

@jacobtomlinson

Yes, I use the Debug=True param in EC2Cluster, but unfortunately the worker process closes automatically.

However, I monitored this log using the command “less +F /var/log/cloud-init-output.log”
The last thing I saw before the instance crashed was the following line:

exec env DASK_INTERNAL_INHERIT_CONFIG="..."  python -m distributed.cli.dask_spec tcp://IP_ADDRESS:8786 --spec '{"cls": "dask.distributed.Nanny", "opts": {"nthreads": 1, "nworkers": "auto", "name": "dask-cd86e71b-worker-38500527"}}'

*The DASK_INTERNAL_INHERIT_CONFIG is too long, so it is skipped here.

@jacobtomlinson hi!

This is just a polite reminder. Have you opportunity to reproduce this problem?