Documentation on the interplay between graphs and futures

Hi @akhmerov, welcome!

I agree that the documentation around distributed.Futures is pretty confusing, and it should be improved. To me, the important differences (and I’m editorializing a bit here) are:

  1. Futures are mostly a reimplementation of the concurrent.futures API. It’s mostly a different API from the dask collections API (e.g., dask.dataframe or dask.array).
  2. Futures represent an unrealized result on a distributed cluster. They are defined in distributed, and almost entirely absent from the dask/dask codebase (I did a quick grep through, and only found one conditional import of them).

I said above that it’s “mostly a different API”, but, of course, you identified some exceptions :slight_smile: . I think one of the things that is weird about the distributed.Client interface is that it implements two APIs at the same time: (1) concurrent.futures and (2) dask.{compute, persist}, and there are some places where they can be mixed. I generally think that most people should stick with the standard dask collections/task-graph API until they need to do something trickier with concurrency on the client. But opinions may differ there.

This is true if your client is asynchronous, but if you are using the default synchronous client, doesn’t it return the result of the computation directly?