Mechanisms to implement partial completion support

I would like to signal completion to my calling code when the dask collection is not fully computed but crosses a time threshold and has achieved user-specified level of completion (say 98%). One mechanism I find for this is manually checking the futures related to collection for completion. However I was curious if I could use some other mechanisms (e.g. scheduler plugins which count leaf task completions) to implement such a feature.


NOT A CONTRIBUTION

Well you could certainly use a SchedulerPlugin to do that, using transition and update_graph entry points. Not sure if it would be simpler than doing it as you do now, as you’ll have to carefully handle on which graph and tasks you do the completion accounting, and be able to add some callback to stop the undergoing computation…

I don’t see any other way to do it.

One could probably use as_completed to build something like this but there are caveats with this and the API is not intuitive when working with higher level collections.

If somebody is interested to implement something more general, we actually have a scheduler plugin (or multiple) that is basically doing what has been suggested above already, see distributed/distributed/diagnostics/progress.py at main · dask/distributed · GitHub and distributed/distributed/diagnostics/progressbar.py at main · dask/distributed · GitHub

We currently use this for progressbars or the task stream on the dashboard but with a slightly different frontend, I could see this being used to implement a callback mechanism as well.

Thanks for the pointer. I will make an attempt at adapting this plugin.


NOT A CONTRIBUTION