We have a case where we are starting EC2 Instances using Docker Images sourced from an internal docker registry.
This tunnel is unfortunately rather limited in throughput at times, which leads to (from what I can tell)
pip installations failing due to timeouts when the network is saturated.
From what I can see in prepare.sh, Dask basically does a
pip install $EXTRA_PIP_PACKAGES. Is is possible to simply add something like
--timeout=<value> at the end of $EXTRA_PIP_PACKAGES and have the
pip installation during bootstrap honor those options? Or would this not work in this case for some reason?
Kind of silly for me to answer my own issue myself for the second day in a row, but:
Simply appending the desired options to the end of the
$EXTRA_PIP_PACKAGES seems to correctly propagate them to the
pip install command in
prepare.sh (do note the whitespace between the last package-to-be-installed and the options). This seems to work since the
pip syntax is:
pip <command> [options]
such that options can come after the command and packages. It would likely not work if the options were expected before the command.
That said, this isn’t a very clean solution. I assume there was no intention to allow this explicitly because
$EXTRA_PIP_PACKAGES isn’t meant as the primary package source for the Dask containers.
That said, I wouldn’t mind opening a PR to add this into the documentation, if desired…
It’s not that bad, but I agree this is not the cleanest I’ve seen.
I guess the preferred way would be for you to extend the Docker image and build your own.
That’s a really nice proposal, I’m not sure if we want to add this to the doc though, but maybe @jacobtomlinson has something to say?