Hello, I am thinking of using dask futures for an async web application which also makes use of computations on potentially huge dataframes.
The idea would be similar to this
- Request arrives and we send the processing task to a dask future
- Store future reference in a dictionary to keep it alive (something like store['future.key’] = future)
- Respond to caller with a reference to the future
future.key - The dask workers should in the meantime start working on the processing task which sometimes involves dask Dataframe operations like
groupbyand pass DataFrames as results/params
Before I make a potentially big mistake I would like to ask the opinion of the experienced crowd here if this would be a reasonable and safe use of dask/distributed or is something like this not recommended? I could not find anyone doing/asking about something similar nor did I find some documentation describing this scenario or warning of using DaskDataframe operations inside of futures.
I would really appreciate any kind feedback or opinions.