I wanted to know if the below workflow would be possible using dask?
I have a dataset with image paths and another feature (some numeric feature) (lets say for simplicity 4 data points) (this is loaded as a dask dataframe)
s3://mydata/image1.jpg, 1
s3://mydata/image2.jpg, 2
s3://mydata/image3.jpg, 3
s3://mydata/image4.jpg, 4
Now, I divide this dataframe into 2 partitions. (and I have 2 dask workers)
I want to send one partition to one worker. Then, I have a public s3 bucket (or any repository) where these images are stored. I want to download the image to the dask worker (i1, i2 to worker 0 , i3,i4 to worker 1) and then perform map partitions/apply on each row of the partition. So, each worker’s data will be on the machine the worker will be running in.
Is such a workflow possible using Dask?