Best practice to distribute

This depends on your computing context. Are you on your personal Laptop, on a single server, or on a compute cluster? How many resources (CPU, RAM) do you have ? What amount of memory does need each chunk ? Can this be lowered if you reduce chunks size?

In any case, you should configure your workers or your multiprocessing cluster accordingly.

I see you’re just using

client = Client()

which defaults to using all the machine capacity as number of process/threads and available memory. For example, say you have a laptop with 8 cores and 16GB memory, this will leads to:

With this setup, Dask will be able to launch 8 tasks at the same time (one per thread, so one per core), so in your case, 1 column per core with 2GB of memory available for the computation.

So the easier way is just to configure your Dask cluster (so your workers) according to the problem you’re trying to solve. If you need 4GB of memory for the operation on each chunk, then you should use something like:

from dask.distributed import Client, LocalCluster
cluster = LocalCluster(n_workers=4, threads_per_worker=1)
client = Client(cluster)

You can also do more complex things using Dask Resources mechanism, but I would advise against it and try to keep it simple for now.