Best practices for user configuration

multimeric · August 8, 2022, 2:07pm

Let’s say I have a user-facing workflow written with dask, and the user needs to provide the input data, and they’re able to tweak a few settings. Should I use the dask config file for this purpose? I like the idea of all the workflow configuration, including the configuration of dask itself (the scheduler etc), living in the same place. But on the other hand, the dask config is generally global to a system (ie it lives in ~/.config/dask/ or /etc/dask/), and doesn’t vary per workflow, so maybe it doesn’t make sense for workflow-specific parameters to live there? Is there some other mechanism for user-facing config that should be used with dask?

pavithraes · August 10, 2022, 12:44pm

@multimeric It depends. Where would you like to set these non-dask-core user configs? If the config is centralized and lives on the scheduler, it might get over-written. But if it’s not, and is different for each client, then using the same file might work. Does this make sense?

and the user needs to provide the input data

Could you please share some details about the input data?

multimeric · August 10, 2022, 1:43pm

Hmm I’m not sure what you mean. Are you asking where the config file will live, or on which machine the config will live in memory? I don’t think it needs to be on the scheduler, it just has to be accessible on the main Python process that creates the DAG (not sure what that’s called in dask terminology). Why would it get overwritten?

But if it’s not, and is different for each client, then using the same file might work.

What do you mean by client here? I want the workflow to allow for different config between different runs from the same codebase, or even different runs of different Python applications.

Could you please share some details about the input data?

Oh, I just mean like a URL to the location of one or more files, e.g. s3:// URLs or file:// URLs.

multimeric · September 4, 2022, 8:52am

So far I’m pretty happy with my approach. I set DASK_CONFIG=dask.yml to allow a local config file just for the given codebase, then add a new key which is my_workflow (not the real name, of course) at the root of the file, which I populate with custom keys. Then in code I access this using dask.config.config.get("my_workflow"). As I mentioned, it keeps everything in the same place: both the dask execution level settings, and the custom workflow configuration.

Topic		Replies	Views
Dask config - how does it actually work? Distributed kubernetes , distributed	7	157	September 4, 2024
Setting Dask Distributed config variables when deploying Dask Gateway with Helm Deploying Dask dask-gateway , distributed	3	905	May 16, 2022
How does the dask config works behind the scenes? Distributed kubernetes , distributed	1	24	September 11, 2024
Portable Workflows: Specifying the cluster class via config Distributed	1	204	September 7, 2022
How do I broadcast configuration information to all worker nodes? I'm a bit in a hurry, thank you Distributed distributed	4	161	October 23, 2023

Best practices for user configuration

Related topics