Dask EC2Cluster Private Docker Failing

I’m trying create an EC2Cluster with the private AWS docker image, however the docker pull fails with “no basic auth credentials” when I looked in journalctl. This causes the docker container to not be run and the EC2 instance to be shutdown.

In journalctl I don’t see a docker login and I don’t see anything that does this in cloud-init.yaml.j2. How is the private docker image pulled without first logging into docker?

When I made my private docker container public, it all worked.

I’m running dask==2022.2.0 and dask-cloudprovider==2022.1.0

My code is below:

if __name__ == '__main__':
    cluster = EC2Cluster(n_workers=1,
                         vpc='<my_vpc>',
                         subnet_id='<my_subnet>',
                         security_groups=['<my_security_group>'],
                         security=False,
                         bootstrap=True,
                         docker_image='<my_private_docker>',
                         iam_instance_profile=IamInstanceProfile,
                         debug=True,
                         auto_shutdown=False)

    client = Client(cluster)

    future = client.submit(complex)
    result = future.result()

Sorry you’re having trouble here! The VM shuts down in case of failure as a cost-saving measure, I see you’ve set debug=True which should leave the VM running for you to explore.

This definitely sounds like an oversight. There are two paths forward here to work around things today that we should definitely document, but we should look at a better fix going forwards.

1. Configure the Docker login

Today dask-cloudprovider doesn’t perform the docker login for you, you need to specify that yourself via the extra_bootstrap option.

Looking at the docs here I expect it will be something along the lines of.

...
    extra_bootstrap = [
        "pip install awscli",
        "aws ecr get-login-password --region region | docker login --username AWS --password-stdin aws_account_id.dkr.ecr.region.amazonaws.com"
    ]
    cluster = EC2Cluster(n_workers=1,
                         vpc='<my_vpc>',
                         subnet_id='<my_subnet>',
                         security_groups=['<my_security_group>'],
                         security=False,
                         bootstrap=True,
                         docker_image='<my_private_docker>',
                         iam_instance_profile=IamInstanceProfile,
                         debug=True,
                         auto_shutdown=False,
                         extra_bootstrap=extra_bootstrap)

2. Bake the image into a VM with Packer

Given the time it takes to pull container images it is popular to create a custom AMI with the container image already pulled, this way things can just start up right away.

https://cloudprovider.dask.org/en/latest/packer.html

3. (Future) Automate option 1

We should consider automatically running docker login as part of the startup script instead of leaving users to add it themselves. This may be a little challenging given the variety of places folks might want to log into. However for EC2Cluster it’s a safe bet to at least attempt a login to ECR.

3 Likes

ahh man thanks for that, I hadn’t realised extra_bootstrap was an arg
yeah creating an image was going to be my next attempt

once again, many thanks !

1 Like

im sorry to reopen this topic but im having the exact same problem and i feel kinda lost on how can i solve this with a custom ami.
So i have tried both ways:

1: i ran into a problem where if i don’t specify iam_instance_profile i can’t perform the “aws ecr get-login-password --region region | docker login --username AWS --password-stdin aws_account_id.dkr.ecr.region.amazonaws.com” because there are no credentials, for security reasons i would like to no put my credentials on the cloud-init file ,but if possible i would also like to avoid the iam_instance_profile, so is there any other way or this is a deadend???

2: I have baked my custom ami image , pulled the docker image from the config.json file but then dask will not use it if i don’t specify it in the cloud init, but if i specify it on the cloud init i run into the same problem of the point 1 because cloud init will run before config.json.

was wondering if you can enlight me on this subject