Long running background process - pattern?

Hi there!

New user to Dask here. I’m looking to see whether Dask is appropriate for a task I have in mind.
I’ve got some REST service. The data it serves up takes a while to calculate. So what I’d like to do is calculate things ‘in the background’ periodically (every few seconds), and when the REST API gets hit, simply serve up the latest calculated data. I want the background process to just live forever: Calculate, ‘publish latest result’, wait a bit, calculate, …
Just wondering if there are any obvious patterns I can use here within Dask to set this up and ‘communicate’ the latest calculated result (few thousand row datafame) from the worker back to the ‘main’ process (the REST service) so that when a request comes in the ‘latest’ result can be instantly served up?
(I.e nothing in the incoming request is changing what is calculated or how - that is happening anyway - I just want access to the latest calculated result when requested).

Hi @aconstnull, welcome to Dask community!

I think that you can achieve this quite easily by using the distributed Scheduler, and client.publish_dataset.

Let me know if it is enough for you to start.

Thanks very much for the reply. The dataset thing looks cool.
So - just to check in understanding correctly:
I’d do something like kick off a task that runs forever (maybe using an Event to shutdown), doing a calc, publish dataset, sleep a bit, calc again, …. ?
Presumably we can “republish” the dataset on the same name to get an update.

Will give it a try

What I would do (but I have not a great sense of what you have in mind) is submitting periodically a task from the main process. But what you say is also possible.

Essentially all I want to do is have something which updates some slow-ish calculations every “x” seconds, doing this calculation off the “main” thread, and have the latest computed results available to the main thread: the idea being that when a api call comes in from outside - I have “ready to go” results without the latency hit of the calculation.

The only issue I’d have with kicking off a task periodically is that I’d then still need a scheduler / wake up to trigger sending the task periodically (is there a way in Dask to do this?). So the idea of a single long running task was just so I can set everything up at “start up”

There is no way in Dask to do this. But I came across a short discussion that you might find interesting.

It proposes using Tornado PeriodicCallback in a Scheduler plugin.

I don’t fancy long running task, you’ll have to do some error handling, and I’m not sure how Dask will cope with infinite task.