This is a hard question to answer without knowing more about what how you’d like to use Dask. Mind sharing a bit more about what you’d like to do and how you’re deploying Dask (e.g. locally, HPC cluster, in the cloud)?
In addition to the Dask documentation on the Delayed and Futures APIs, this explanation from Ian Rose might help:
I will deploy Dask on Kubernetes, and using both delayed and futures objects at the moment.
This is an example of how I am trying to parallelize:
for col in columns:
missing_frequencies.append(dask_df.map_partitions(get_frequencies, col))
#get_frequencies is a delayed method
missing_frequencies_futures = client.compute(missing_frequencies)
missing_frequiencies_result = client.gather(missing_frequencies_futures)
I am not sure if that is a good practice, I would greatly appreciate if you can help here. Thank you
My motto here is : “Use Delayed when you can, fallback to Future when you need”.
I find Delayed much more elegant and simple for non real-time work, for graph or workflows you can define from start to end without needing to inspect part of the result at some point.
But sometimes you just need Future API, it just kind of seem wrong to me for some workflows.