Hi there,
I’m using dask_cloudprovider.aws.EC2Cluster to create and connect to a remote AWS cluster.
I need each cluster node to include as many workers as it has vCPUs, and each worker to have only one thread. I have the same behavior in LocalCluster using the “threads_per_worker” option, but I haven’t found any ways to do this in EC2Cluster.
I also tried using “nthreads” for “worker_options” but this way I would only have one worker per node:
cluster = EC2Cluster(
region = ***,
availability_zone = ***,
ami = ***,
instance_type = ***,
vpc = ***,
subnet_id = ***,
filesystem_size = ***,
key_name = ***,
n_workers = ***,
docker_image = ***,
debug = ***,
security = ***,
worker_options = {
"nthreads": 1,
}
)
Could you help me with this?
I think you should be able to set "nthreads": 0
which will autodetect the number of vCPUs.
@jacobtomlinson thanks for your reply.
No, this way I will only have one worker per node.
Let me explain with an example:
I have a cluster with 8 vCPUs per node. I want to have 8 workers per node, each with only one thread.
Ah I see, you want to set n processes instead of n threads. I guess because you are using some non-threadsafe code?
In that case can you set {"nthreads": 1, "nworkers": "auto"}
instead?
@jacobtomlinson
Unfortunately, this doesn’t work for me. The scheduler node initialized successfully, but workers did not. I also don’t see any logs to share with you. Could you please check it?
Please set debug=True
to leave the VMs running after they fail to start and then SSH to a worker and check the /var/log/cloud-init-output.log
file.
@jacobtomlinson
Yes, I use the Debug=True param in EC2Cluster, but unfortunately the worker process closes automatically.
However, I monitored this log using the command “less +F /var/log/cloud-init-output.log”
The last thing I saw before the instance crashed was the following line:
exec env DASK_INTERNAL_INHERIT_CONFIG="..." python -m distributed.cli.dask_spec tcp://IP_ADDRESS:8786 --spec '{"cls": "dask.distributed.Nanny", "opts": {"nthreads": 1, "nworkers": "auto", "name": "dask-cd86e71b-worker-38500527"}}'
*The DASK_INTERNAL_INHERIT_CONFIG is too long, so it is skipped here.
@jacobtomlinson hi!
This is just a polite reminder. Have you opportunity to reproduce this problem?