Performance of Client.submit appears to depend on the size of traceback

See here: distributed/client.py at b1e06a001c6b0b104a4567fc15dd07edd51fb7aa · dask/distributed · GitHub

traceback appears to be transmitted to the scheduler. I’m seeing that, when the caller of submit is a function that has fewer lines of code, my Dask application runs faster. Is this a known issue? Is the traceback required or for debugging only? If it’s the latter, it might make sense to optionally disable transmission of traceback.

Hi @elibol. Yes, the calling code is transmitted to the scheduler when client.[compute, persist, submit] is called. This is primarily used for diagnostic/reporting purposes (associating code snippets with actual tasks that get run).

In principal, if the code snippet is very large, this will increase the overhead of transmitting data to the scheduler. But I’d only expect that to happen if you are submitting many small individual tasks with large code snippets (best practices would suggest that you submit larger task graphs all at once rather than many small ones). And even then, I’d be surprised if it made a big difference in the actual computation time, rather than the overhead of client-scheduler computation.

Are you able to provide a minimal reproducible example demonstrating your slowdown? In particular, I’m curious to see two things:

  • Whether we can reproduce the issue with many small submits, and whether there is a workaround by submitting larger graphs, and
  • Whether we can determine if the computations themselves slow down, instead of the client-scheduler communication overhead.

I do think that it could be reasonable to make the transmission of this data optional, just trying to understand how big a problem it is in practice.

Hi @ian,

Is it possible to submit multiple tasks via Client.submit?

Right now, we’re using Dask as a backend for project NumS (see here for implementation). When we run an ablation comparing various algorithms and schedulers, we’re seeing that Client.submit dominates the execution time within the driver process, and the total execution time of certain benchmarks is determined by how quickly Client.submit executes. For instance, as we increase the number of blocks per array for some micro benchmarks, the impact on performance is amplified. I believe this is a critical overhead to reduce in order to improve Dask’s performance on short-lived tasks. We can look into batching if it’s possible via Client.submit.

Can you use Client.map instead? That will be much faster than many individual submissions.

1 Like