Right, so for my use case, after sharding and shuffling I wanted to do a data parallel operation (described in Dividing data among workers and downloading data local to a worker). i.e. I want to preprocess every row of the dataset.
I read more in the dask documentation, and was thinking I can use map_partitions function to let dask handle the rows instead of dividing the rows manually.