What is the pros/cons of using Futures/Delayed?

Hi,

I’m working on a parallelization task which requires low level customization.

I think I understand both Delayed and Future concepts, but I have a hard time deciding which one to use in which situation.

I would appreciate it if you could show two scenarios where one is more suitable than the other. (Or how to efficiently use both together).

Thank you,

Hi @bcaglaraydin and welcome!

This is a hard question to answer without knowing more about what how you’d like to use Dask. Mind sharing a bit more about what you’d like to do and how you’re deploying Dask (e.g. locally, HPC cluster, in the cloud)?

In addition to the Dask documentation on the Delayed and Futures APIs, this explanation from Ian Rose might help:

Hello! Thank you for your answer,

I will deploy Dask on Kubernetes, and using both delayed and futures objects at the moment.

This is an example of how I am trying to parallelize:

for col in columns:
             missing_frequencies.append(dask_df.map_partitions(get_frequencies, col))
#get_frequencies is a delayed method

missing_frequencies_futures = client.compute(missing_frequencies)
missing_frequiencies_result = client.gather(missing_frequencies_futures)

I am not sure if that is a good practice, I would greatly appreciate if you can help here. Thank you :slight_smile:

Hi there,

My motto here is : “Use Delayed when you can, fallback to Future when you need”.

I find Delayed much more elegant and simple for non real-time work, for graph or workflows you can define from start to end without needing to inspect part of the result at some point.
But sometimes you just need Future API, it just kind of seem wrong to me for some workflows.