Memory calculation: each worker works on each partition?

Hi, thanks for great works. I have a quick question.

When determining the number of partition (npartition) for creating bag, the memory of the partition should fit on, or smaller than, the available memory of one worker(1 process; 1 thread) because the unit of task assigned to worker is “partition”, is my understanding correct?

Thanks in advance.

Hi @henry and welcome to discourse!

Dask can (and will) try to work on as many partitions in parallel on a worker as there are cores available. Therefore, it can actually be better to partition your bag so that many partitions can fit in memory on a worker at a time. There are more details in the Dask Best Practices.

1 Like