In production I imagine most users will be using dask_jobqueue
or dask_cloudprovider
rather than the standard LocalCluster
. However, while both of these libraries can be configured using the usual dask.config
mechanism, you can’t actually choose the cluster type using the config, you have to actually edit your code to add in a SLURMCluster()
or FargateCluster()
into your code, which instantly makes the workflow non-portable. By this I mean that I want the exact same codebase to be runnable by an HPC user and a cloud user, without them or I having to edit the actual Python code. If you could specify the cluster type using the dask config, then this would be a non-issue, but this doesn’t seem to be the case.
What is the best solution here to allow my workflows to retain portability? Is there a mechanism for using the dask config to choose the cluster?