Processing all rows of a large database table: a good use case?


I have a use case where I want to take all rows in a large Postgres database table, do a bit of computation for each row separately, and write something back to the database. I am new to Dask and to me it looks like it would be a good use case for it as this is something that will require a lot of memory and that is massively parallel.

However, I wanted to confirm that this is indeed something Dask would be good for? And if so, should I do everything in Dask Dataframes or maybe first load the data “manually” and use Dask only for the map operation?


Hi @rodriguealcazar, welcome to Dask community!

Well yes, especially if processing each row is a bit compute intensive and require memory, Dask sounds like a great fit.

I would advise using Dask Dataframes if possible, using read_sql. This is always more straightforward. But be careful on how you’ll split the partitions if you need a lot of memory.