Hi @guillaumeeb,
Thanks for the info again, and i didn’t especially make any result group. I read parquet with fixed block size, e.g. 32MiB or 64MiB, and the parquet file also has fixed row group size 64MiB. Since i used dd.read_parquet into Dask dataframe, everything are in Dask collections
Most of my operations on Dask collections are read_parquet/to_parquet, map_partitions, apply, delayed.
That is one reason i wonder why the un-managed memory is that high.
One information I read from some pages is about the python object (structured data type or string) is sort of part of un-managed memory. However, it bother me because it causes the spill function almost not working.
But sometimes some worker looks it has right amount, see third worker from bottom
My speculation is about structured data make partition size unbalance, and i am still finding
