Hello,
I have an embarrassingly parallel use case when perform computation for individual items (molecules) and gather the results. I use SSH cluster and client.map function where I supply a list of molecules. The issue is that utilization somewhat decreases with increasing of the number of nodes and workers due to tail computations. This happens because every calculation can take variable time 0.5-5 min. If a 5 min computation is the last one and I use 300 workers, all others are waiting for 5 min to finish this single calculation. This is wasting of resources. I can roughly estimate time required to process every item. Thus, I can set a higher priority to time-consuming tasks and fast tasks will be performed at the very end reducing the overall resource consumption and speed up calculations a little bit. However, there is no option to submit priorities for individual tasks within map. I can submit tasks one by one in a for loop and provide a custom priority for every task, but it looks not very elegant and efficient (there will be millions of items in a list).
Is there another possibility to submit individual priorities for tasks? Ideally this should complement with map Or there is no other way than I described?
If not, it may become a feature for future implementation.