Documentation on the interplay between graphs and futures

ian · January 24, 2022, 8:38pm

I agree that the documentation around distributed.Futures is pretty confusing, and it should be improved. To me, the important differences (and I’m editorializing a bit here) are:

Futures are mostly a reimplementation of the concurrent.futures API. It’s mostly a different API from the dask collections API (e.g., dask.dataframe or dask.array).
Futures represent an unrealized result on a distributed cluster. They are defined in distributed, and almost entirely absent from the dask/dask codebase (I did a quick grep through, and only found one conditional import of them).

I said above that it’s “mostly a different API”, but, of course, you identified some exceptions . I think one of the things that is weird about the distributed.Client interface is that it implements two APIs at the same time: (1) concurrent.futures and (2) dask.{compute, persist}, and there are some places where they can be mixed. I generally think that most people should stick with the standard dask collections/task-graph API until they need to do something trickier with concurrency on the client. But opinions may differ there.

This is true if your client is asynchronous, but if you are using the default synchronous client, doesn’t it return the result of the computation directly?

Topic		Replies	Views
Different scheduling for dask delayed and dask futures? Distributed delayed , future , scheduler	1	615	April 19, 2022
What is the pros/cons of using Futures/Delayed? Distributed delayed , future , distributed	3	1956	January 15, 2023
Debugging Dask - Futures API Distributed distributed	4	271	May 12, 2022
Can a worker perform computation and io in parallel? Distributed distributed	1	135	August 18, 2023
Dask Plugin for after finishing execution of a dask graph Distributed	1	88	January 24, 2024

Documentation on the interplay between graphs and futures

Related topics