Replaced back everything with two level futures with better optimized chunk sizes, that removed all overhead related to bag/delayed. Still no luck with reading inside sub-futures thou.
But I think this cannot be better: