Documentation on the interplay between graphs and futures

Right now the distributed documentation reads roughly as “task graphs are powerful, but if you need something more complex—use futures”.

Digging deeper, however, it seems that tasks and futures play nicely together:

  • client.compute returns a future :heavy_check_mark:
  • Feeding a future as an input to delayed ensures its result is retrieved :heavy_check_mark:

Is it correct that the apparent division between tasks-only and futures-only workflows is indeed only an impression that the docs give, or is there some problem that one should be aware about? If it is the former, then I think it would be worth making the relation more explicit in the tutorials. Or did I not find the right tutorial?

Hi @akhmerov, welcome!

I agree that the documentation around distributed.Futures is pretty confusing, and it should be improved. To me, the important differences (and I’m editorializing a bit here) are:

  1. Futures are mostly a reimplementation of the concurrent.futures API. It’s mostly a different API from the dask collections API (e.g., dask.dataframe or dask.array).
  2. Futures represent an unrealized result on a distributed cluster. They are defined in distributed, and almost entirely absent from the dask/dask codebase (I did a quick grep through, and only found one conditional import of them).

I said above that it’s “mostly a different API”, but, of course, you identified some exceptions :slight_smile: . I think one of the things that is weird about the distributed.Client interface is that it implements two APIs at the same time: (1) concurrent.futures and (2) dask.{compute, persist}, and there are some places where they can be mixed. I generally think that most people should stick with the standard dask collections/task-graph API until they need to do something trickier with concurrency on the client. But opinions may differ there.

This is true if your client is asynchronous, but if you are using the default synchronous client, doesn’t it return the result of the computation directly?

Thanks for the overview, this was useful!

To explain my use case: I would like to design a workflow where:

  1. There is an outer control loop that does parallel adaptive computations: either sampling, similar to what I now do in adaptive, or optimization.
  2. There function that is sampled is defined as a task graph to benefit from the scheduler.
  3. Both the control loop and the sampling run within the same cluster.

This seems to require being able to mix and match the two approaches.

This is true if your client is asynchronous, but if you are using the default synchronous client, doesn’t it return the result of the computation directly?

Uhh, I may be mistaken, but I don’t think so. With all the latest versions this code

from distributed import Client
from dask import delayed
from time import sleep, time

def g(a):
    return a**2

client = Client()

t = time()
result = client.compute(g(1))
print(type(result), time() - t, result.result(), time() - t)


<class 'distributed.client.Future'> 0.011399030685424805 1 3.034311532974243

I also expected from a cursory reading of the docs that what you wrote should be the case, but I observe instead that asynchronous=True makes compute return coroutines and not futures.

My mistake, I was mis-remembering the API:

  • Client.compute(sync=True) automatically gets the results from the futures, not Client(asynchronous=True). (sigh…)
  • g(1).compute() will also block on the results, and use your Client() in the background.
1 Like

I’d like to come back to my original question then: it seems that tasks and Futures play nicely together and there is no hidden cost to this combination beyond just the downsides of using futures in the first place. Is that a reasonable assessment?

Yes, they should play nicely together. I see you’ve already weighed in on this issue, so you’ve been thinking about launching tasks (or collections of tasks) from within tasks. Your use-case does sound tricky enough that designing around some mixture of Futures and graphs makes sense.