This is the part of the dask
docs I’m looking at
https://docs.dask.org/en/stable/bag-creation.html?highlight=npartitions#db-from-sequence
I am aware of other “best practices” pages but I guess the issue is I don’t even have a good mental model of what dask
is doing that would help me to start troubleshooting. Based on the “drawbacks” section of the bag
docs I guess I’m hitting some of the pain points of multiprocessing