New user to Dask here. I’m looking to see whether Dask is appropriate for a task I have in mind.
I’ve got some REST service. The data it serves up takes a while to calculate. So what I’d like to do is calculate things ‘in the background’ periodically (every few seconds), and when the REST API gets hit, simply serve up the latest calculated data. I want the background process to just live forever: Calculate, ‘publish latest result’, wait a bit, calculate, …
Just wondering if there are any obvious patterns I can use here within Dask to set this up and ‘communicate’ the latest calculated result (few thousand row datafame) from the worker back to the ‘main’ process (the REST service) so that when a request comes in the ‘latest’ result can be instantly served up?
(I.e nothing in the incoming request is changing what is calculated or how - that is happening anyway - I just want access to the latest calculated result when requested).
Thanks very much for the reply. The dataset thing looks cool.
So - just to check in understanding correctly:
I’d do something like kick off a task that runs forever (maybe using an Event to shutdown), doing a calc, publish dataset, sleep a bit, calc again, …. ?
Presumably we can “republish” the dataset on the same name to get an update.
Essentially all I want to do is have something which updates some slow-ish calculations every “x” seconds, doing this calculation off the “main” thread, and have the latest computed results available to the main thread: the idea being that when a api call comes in from outside - I have “ready to go” results without the latency hit of the calculation.
The only issue I’d have with kicking off a task periodically is that I’d then still need a scheduler / wake up to trigger sending the task periodically (is there a way in Dask to do this?). So the idea of a single long running task was just so I can set everything up at “start up”