Loading large dataset from postgres using the minimun amount of memory

Hello!, we are trying to load a large dataset from a postgresql database using psycopg2 and pandas. The problem is the high memory usage. Is posible use dask for reduce the memory usage of the request?

Hi @luis.casas, welcome to Dask community!

It’s a little hard to tell without knowing your workflow.
Dask can help if you don’t need the complete dataset in memory at a given time. It ca read the data by chunk, thus freeing memory one a chunk has been processed, but it depends what you want to do with this dataset.

Say you just want to do some simple ETL, read data, process by chunk, and write it back into Parquet file format. Then Dask will enable you to do this streaming the input data and keeping memory usage to a lower level.

Thanks for the quick reply!
My question is more about whether it is possible to perform operations on a large dataset without loading it into memory entirely. We can’t use chunks because the operation needs to be done in one go.
Yes, it would be something like the ETL example but the step of reading the data completely exhausts the container memory. Any ideas?

Well, it depends on your workflow. Could you provide some reproducible example? Or a code snippet?

Dask is first about performing operations without loading it into memory entirely, but it supposes the operation can be done by chunk in a map/reduce fashion.