Hi Guillaume,
thanks for replying.
looks like we are neighbours, my wife and you are working in the same company apparently, and myself working in ‘a big space company’ the other side of the ‘rocade’ 
I’m unpacking your comment about queuing, because i am facing some unexpected results
side question: Since i am using dask scheduler CLI to launch the scheduler, i don’t see in its CLI options how to pass worker-saturation param to the CLI (https://distributed.dask.org/en/stable/scheduling-policies.html#queuing mentions that it needs to be set on scheduler before it starts - does it means at scheduler instantiation, or before any call to compute/submit ?). from tests it seems that using dask config set with context in python does not change the value when scheduler is ran from CLI.
my cluster is as follows: worker saturation not set (default value). 40 regular CPU workers with resources=CPU:1 (and they are single thread workers i.e. nthreads=1) and 1 CUDAWorker (from dask-cuda) with resources=GPU:1 (single thread worker as well). Both workers and scheduler started from CLI.
i have N tasks ‘compute_metrics’. each of them needs one ‘propagate_TLE’ task (resources =CPU:1) and one ‘propagate_keplerians’ task (resources=GPU:1).
Actually all ‘compute_metrics’ tasks use the same ‘propagate_keplerians’ task results, but a different ‘propagate TLE’ task
all tasks are submitted using client.submit with resources keyword
putting all ‘submit’ calls inside a “with dask.config.set({“distributed.scheduler.worker-saturation”: np.inf}):” does not change this behavior (but it is very likely - see above - this param is not accounted for by my CLI-ran scheduler)
on the processing plot: i was expecting no one exceeding ceil(1.1 * 1) = 2, but I see that tasks being processed largely exceed 2 on every workers (10/worker in average).
when i remove my CUDAWorker and disable all ‘resources’ keywords (so having only 40 single-thread CPU (regular dask) workers with no resources required at submit() level), I can see some effect, but it is a bit strange: the scheduler seems to split all tasks in two batches. the first batch is sent to all workers, and memory rise up rapidly. Once the first batch of ‘propagate_TLE’ tasks (dark green) is completed, the ‘compute_metrics’ are started (light green). At this time, each worker has in memory about 10 tasks and memory use rises rapidly (altough the worker memory limit is not reached, but it seems to me that queing behavior is not dependent on worker memory?)
Then as ‘compute_metrics’ tasks of the first batch are completed, scheduler submits new ‘propagate_TLE’, but this time it fulfills the “max 2 tasks per worker’ default rule (worker saturation=1.1 and nthreads=1). we can see that each worker executes one or two propagate_TLE tasks, then a ‘compute metrics’ task, then it starts over with one or two propagate_TLE tasks
i cant manage to upload pictures, probably “thanks” to my company security rules..