Dask EC2Cluster Private Docker Failing

I’m trying create an EC2Cluster with the private AWS docker image, however the docker pull fails with “no basic auth credentials” when I looked in journalctl. This causes the docker container to not be run and the EC2 instance to be shutdown.

In journalctl I don’t see a docker login and I don’t see anything that does this in cloud-init.yaml.j2. How is the private docker image pulled without first logging into docker?

When I made my private docker container public, it all worked.

I’m running dask==2022.2.0 and dask-cloudprovider==2022.1.0

My code is below:

if __name__ == '__main__':
    cluster = EC2Cluster(n_workers=1,
                         vpc='<my_vpc>',
                         subnet_id='<my_subnet>',
                         security_groups=['<my_security_group>'],
                         security=False,
                         bootstrap=True,
                         docker_image='<my_private_docker>',
                         iam_instance_profile=IamInstanceProfile,
                         debug=True,
                         auto_shutdown=False)

    client = Client(cluster)

    future = client.submit(complex)
    result = future.result()

Sorry you’re having trouble here! The VM shuts down in case of failure as a cost-saving measure, I see you’ve set debug=True which should leave the VM running for you to explore.

This definitely sounds like an oversight. There are two paths forward here to work around things today that we should definitely document, but we should look at a better fix going forwards.

1. Configure the Docker login

Today dask-cloudprovider doesn’t perform the docker login for you, you need to specify that yourself via the extra_bootstrap option.

Looking at the docs here I expect it will be something along the lines of.

...
    extra_bootstrap = [
        "pip install awscli",
        "aws ecr get-login-password --region region | docker login --username AWS --password-stdin aws_account_id.dkr.ecr.region.amazonaws.com"
    ]
    cluster = EC2Cluster(n_workers=1,
                         vpc='<my_vpc>',
                         subnet_id='<my_subnet>',
                         security_groups=['<my_security_group>'],
                         security=False,
                         bootstrap=True,
                         docker_image='<my_private_docker>',
                         iam_instance_profile=IamInstanceProfile,
                         debug=True,
                         auto_shutdown=False,
                         extra_bootstrap=extra_bootstrap)

2. Bake the image into a VM with Packer

Given the time it takes to pull container images it is popular to create a custom AMI with the container image already pulled, this way things can just start up right away.

https://cloudprovider.dask.org/en/latest/packer.html

3. (Future) Automate option 1

We should consider automatically running docker login as part of the startup script instead of leaving users to add it themselves. This may be a little challenging given the variety of places folks might want to log into. However for EC2Cluster it’s a safe bet to at least attempt a login to ECR.

3 Likes

ahh man thanks for that, I hadn’t realised extra_bootstrap was an arg
yeah creating an image was going to be my next attempt

once again, many thanks !

1 Like