DaskDataframe inside futures

Hello, I am thinking of using dask futures for an async web application which also makes use of computations on potentially huge dataframes.

The idea would be similar to this

  1. Request arrives and we send the processing task to a dask future
  2. Store future reference in a dictionary to keep it alive (something like store['future.key’] = future)
  3. Respond to caller with a reference to the future future.key
  4. The dask workers should in the meantime start working on the processing task which sometimes involves dask Dataframe operations like groupby and pass DataFrames as results/params

Before I make a potentially big mistake I would like to ask the opinion of the experienced crowd here if this would be a reasonable and safe use of dask/distributed or is something like this not recommended? I could not find anyone doing/asking about something similar nor did I find some documentation describing this scenario or warning of using DaskDataframe operations inside of futures.

I would really appreciate any kind feedback or opinions.

Hi @sil-lnagel,

While using Dask DataFrame inside Future (tasks from tasks) is totally feasible even if a bit of a advanced use case, I don’t think I would recommend using Distributed as an async web application backend. The main reason being long lived Dask clusters are not really a common use case, and it isn’t really made for fast asynchronous processing that might be needed in web applications.

So I would maybe rely on another established tool from Python ecosystem to handle the the async web part, which does not prevent for it submitting tasks to a Dask cluster!