Memory limits reached in simple ETL-like data transformations

theJonan · March 25, 2023, 5:59pm

Thank you so much for this deep investigation!

I’ll repost parts of the main question in the Distributed channel, cause it seems to belong there.

Regarding the configuration changes:

As far as I know distributed.worker.memory.target configuration is related to spilling process, which I’ve intentionally disabled (da.config.set({'distributed.worker.memory.spill': False})). My goal for the entire experiment is to see how Dask manages the in-memory stream of data.
The other parameter: da.config.set({'distributed.worker.memory.pause': 0.95}) is a nice catch, although I suspect it just postpones the stall - doesn’t remove the problem.

I was also wondering whether the problem could be of unfreed memory, that’s why I suspected some memory management global lock.

Thanks again, I’ll post an update here, if I find a resolution on the issue.

Topic		Replies	Views
Worker blocking on memory limit, despite the streaming-friendly pipeline process Distributed	3	216	March 28, 2023
To_sql() query does not work for large files out-of-memory on dask cluster inside docker Dask DataFrame	7	306	February 24, 2022
Memory Management of Dask Cluster and a few new user questions Distributed distributed	15	1448	March 13, 2024
Loading large dataset from postgres using the minimun amount of memory	3	355	April 13, 2023
Dask computation takes way too much memory Dask DataFrame distributed	5	1026	December 27, 2023