Prioritizing individual tasks in map

DrDom · April 24, 2023, 10:55am

Hello,

I have an embarrassingly parallel use case when perform computation for individual items (molecules) and gather the results. I use SSH cluster and client.map function where I supply a list of molecules. The issue is that utilization somewhat decreases with increasing of the number of nodes and workers due to tail computations. This happens because every calculation can take variable time 0.5-5 min. If a 5 min computation is the last one and I use 300 workers, all others are waiting for 5 min to finish this single calculation. This is wasting of resources. I can roughly estimate time required to process every item. Thus, I can set a higher priority to time-consuming tasks and fast tasks will be performed at the very end reducing the overall resource consumption and speed up calculations a little bit. However, there is no option to submit priorities for individual tasks within map. I can submit tasks one by one in a for loop and provide a custom priority for every task, but it looks not very elegant and efficient (there will be millions of items in a list).

Is there another possibility to submit individual priorities for tasks? Ideally this should complement with map Or there is no other way than I described?

If not, it may become a feature for future implementation.

martindurant · April 25, 2023, 1:28pm

I believe dask will batch many subsequent submit() calls when communicating with the scheduler, so maybe a for-loop is fine - I recommend you try to see if it makes any noticeable difference.

Topic		Replies	Views
Running eactly one task per DASK worker Distributed	1	244	April 22, 2023
Trouble with priorities Distributed	13	52	February 17, 2025
How to set priority with `dask.delayed`? Distributed delayed	9	273	October 4, 2023
Dask distributed performance issues Distributed kubernetes , future , distributed	1	245	December 7, 2022
How to initialise a ssh cluster the right way and distribute tasks evenly Distributed	4	579	June 26, 2022

Prioritizing individual tasks in map

Related topics