Dask suitability for my use case

Hello there! I’m relatively new to Dask, and I’m trying to figure out if it’s the right tool to use for my use case. Hopefully this is a good place to ask my question.

Most of the examples I see on the website are related to processing very large data sets in parallel, mostly executing many small (potentially interdependent) operations. For my current project, I’m trying to perform parameter studies of many simulations (3-90) minutes each that are not memory intensive and can be performed independently, so I’m using Dask to minimize total runtime by fully utilizing system resources (versus running them sequentially) in a simple way. I understand that the above could also be done with the multiprocessing library, but submitting tasks to the client and monitoring progress from the dashboard is just so easy, letting Dask handle the scheduling and reporting. Is this a common way to use Dask?

Hi @corykinney, welcome to Dask community!

Well, yes, Dask is perfectly suited for this kind of workload! See Embarrassingly parallel Workloads — Dask Examples documentation.

And in addition to API simplicity and the Dashboard, you could even go distributed if you ever needed to!

I would just recommand to make sure you somehow save the overall state of your computations to be able to restart where you were if any problem occurs, but this is true wether using Dask or not.