Dilemma: Schedule IO-Bound / CPU-Bound tasks in cascaded clients

Replaced back everything with two level futures with better optimized chunk sizes, that removed all overhead related to bag/delayed. Still no luck with reading inside sub-futures thou.

But I think this cannot be better: