Uploading data files to the EC2Cluster

Hi there!

I’m trying to use the distributed library on an EC2 AWS cluster. I need to run my task with local files that should be placed on each of the workers. I spent a long time looking into the configuration parameters, but did not find any suitable arguments.

Please, could you help me find the easiest way to configure my cluster to download the required files from an HTTPS URL using dask_cloudprovider.aws.EC2Cluster?

Hi @KSuvorov, welcome to Dask community!

There are probably several ways to do this, one I can think of is to use Client.run.

You could also use a WorkerPlugin to do this.

Edit:
There is also the register_worker_callbacks function.

Hi @guillaumeeb

Thanks for your reply. This solves my problem, but I would like to see in future versions the ability to use native commands to set up environments on all cluster nodes after docker initializing. For example, RayCLI provides a configuration yaml file where you can change setup_commands to do this.

Maybe the --preload command line argument is what you are looking for:
https://docs.dask.org/en/latest/customize-initialization.html