Let’s say I have a user-facing workflow written with dask, and the user needs to provide the input data, and they’re able to tweak a few settings. Should I use the dask config file for this purpose? I like the idea of all the workflow configuration, including the configuration of dask
itself (the scheduler etc), living in the same place. But on the other hand, the dask config is generally global to a system (ie it lives in ~/.config/dask/
or /etc/dask/
), and doesn’t vary per workflow, so maybe it doesn’t make sense for workflow-specific parameters to live there? Is there some other mechanism for user-facing config that should be used with dask?
@multimeric It depends. Where would you like to set these non-dask-core user configs? If the config is centralized and lives on the scheduler, it might get over-written. But if it’s not, and is different for each client, then using the same file might work. Does this make sense?
and the user needs to provide the input data
Could you please share some details about the input data?
Hmm I’m not sure what you mean. Are you asking where the config file will live, or on which machine the config will live in memory? I don’t think it needs to be on the scheduler, it just has to be accessible on the main Python process that creates the DAG (not sure what that’s called in dask terminology). Why would it get overwritten?
But if it’s not, and is different for each client, then using the same file might work.
What do you mean by client here? I want the workflow to allow for different config between different runs from the same codebase, or even different runs of different Python applications.
Could you please share some details about the input data?
Oh, I just mean like a URL to the location of one or more files, e.g. s3:// URLs or file:// URLs.
So far I’m pretty happy with my approach. I set DASK_CONFIG=dask.yml
to allow a local config file just for the given codebase, then add a new key which is my_workflow
(not the real name, of course) at the root of the file, which I populate with custom keys. Then in code I access this using dask.config.config.get("my_workflow")
. As I mentioned, it keeps everything in the same place: both the dask execution level settings, and the custom workflow configuration.