I wanted to know if the below workflow would be possible using dask?
I have a dataset with image paths and another feature (some numeric feature) (lets say for simplicity 4 data points) (this is loaded as a dask dataframe)
Now, I divide this dataframe into 2 partitions. (and I have 2 dask workers)
I want to send one partition to one worker. Then, I have a public s3 bucket (or any repository) where these images are stored. I want to download the image to the dask worker (i1, i2 to worker 0 , i3,i4 to worker 1) and then perform map partitions/apply on each row of the partition. So, each worker’s data will be on the machine the worker will be running in.
Is such a workflow possible using Dask?