Documentation on the interplay between graphs and futures

akhmerov · January 23, 2022, 1:14pm

Right now the distributed documentation reads roughly as “task graphs are powerful, but if you need something more complex—use futures”.

Digging deeper, however, it seems that tasks and futures play nicely together:

client.compute returns a future
Feeding a future as an input to delayed ensures its result is retrieved

Is it correct that the apparent division between tasks-only and futures-only workflows is indeed only an impression that the docs give, or is there some problem that one should be aware about? If it is the former, then I think it would be worth making the relation more explicit in the tutorials. Or did I not find the right tutorial?

ian · January 24, 2022, 8:38pm

Hi @akhmerov, welcome!

I agree that the documentation around distributed.Futures is pretty confusing, and it should be improved. To me, the important differences (and I’m editorializing a bit here) are:

Futures are mostly a reimplementation of the concurrent.futures API. It’s mostly a different API from the dask collections API (e.g., dask.dataframe or dask.array).
Futures represent an unrealized result on a distributed cluster. They are defined in distributed, and almost entirely absent from the dask/dask codebase (I did a quick grep through, and only found one conditional import of them).

I said above that it’s “mostly a different API”, but, of course, you identified some exceptions . I think one of the things that is weird about the distributed.Client interface is that it implements two APIs at the same time: (1) concurrent.futures and (2) dask.{compute, persist}, and there are some places where they can be mixed. I generally think that most people should stick with the standard dask collections/task-graph API until they need to do something trickier with concurrency on the client. But opinions may differ there.

This is true if your client is asynchronous, but if you are using the default synchronous client, doesn’t it return the result of the computation directly?

akhmerov · January 24, 2022, 9:41pm

Thanks for the overview, this was useful!

To explain my use case: I would like to design a workflow where:

There is an outer control loop that does parallel adaptive computations: either sampling, similar to what I now do in adaptive, or optimization.
There function that is sampled is defined as a task graph to benefit from the scheduler.
Both the control loop and the sampling run within the same cluster.

This seems to require being able to mix and match the two approaches.

This is true if your client is asynchronous, but if you are using the default synchronous client, doesn’t it return the result of the computation directly?

Uhh, I may be mistaken, but I don’t think so. With all the latest versions this code

from distributed import Client
from dask import delayed
from time import sleep, time

@delayed
def g(a):
    sleep(3)
    return a**2

client = Client()

t = time()
result = client.compute(g(1))
print(type(result), time() - t, result.result(), time() - t)

outputs

<class 'distributed.client.Future'> 0.011399030685424805 1 3.034311532974243

I also expected from a cursory reading of the docs that what you wrote should be the case, but I observe instead that asynchronous=True makes compute return coroutines and not futures.

ian · January 24, 2022, 11:27pm

My mistake, I was mis-remembering the API:

Client.compute(sync=True) automatically gets the results from the futures, not Client(asynchronous=True). (sigh…)
g(1).compute() will also block on the results, and use your Client() in the background.

akhmerov · January 24, 2022, 11:46pm

I’d like to come back to my original question then: it seems that tasks and Futures play nicely together and there is no hidden cost to this combination beyond just the downsides of using futures in the first place. Is that a reasonable assessment?

ian · January 24, 2022, 11:49pm

Yes, they should play nicely together. I see you’ve already weighed in on this issue, so you’ve been thinking about launching tasks (or collections of tasks) from within tasks. Your use-case does sound tricky enough that designing around some mixture of Futures and graphs makes sense.

Topic		Replies	Views
Different scheduling for dask delayed and dask futures? Distributed delayed , future , scheduler	1	650	April 19, 2022
What is the pros/cons of using Futures/Delayed? Distributed delayed , future , distributed	3	2138	January 15, 2023
Advice on how to structure Dask computation Distributed	7	52	January 16, 2025
Gracefully handle all-or-nothing delayed computations Distributed delayed , distributed	2	322	December 8, 2021
Cannot fetched the data from remote dask cluster Distributed dask-gateway , future , distributed	7	456	November 9, 2022

Documentation on the interplay between graphs and futures

Related topics