Setting environment for scheduler

Hi,

I am using HTCondorCluster to submit jobs to a Condor cluster. I found that the API:

http://jobqueue.dask.org/en/latest/generated/dask_jobqueue.HTCondorCluster.html

allows to specify the environment of the workers, but I can’t find any option to specify the environment of the scheduler.

This is not the case, for example, of the Kubernetes resource manager, which offers a way to define the environment for both the worker and the scheduler (via pod definition):

https://kubernetes.dask.org/en/latest/kubecluster.html

I’d like to ask if there is any way I can set environment variables for the scheduler of an HTCondorCluster. Alternatively, if that is not possible atm, could you point me to the code where the scheduler process of an HTCondorCluster is created, so I can try to figure out what changes would be necessary to support passing some environment to it (and perhaps propose a PR)?

The scenario I’m trying to support is when Dask is used in JupyterLab via the Dask Lab extension. In that context, the scheduler process has the environment of the Jupyter server process (since the extension is a server extension), which does not necessarily match the environment of the Jupyter notebook kernel processes (where the client is created). This can lead to a mismatch in the environment of the client/workers and the scheduler, so it would be good to be able to configure the environment for the latter.

@etejedor Thanks for this question! I’ll keep looking into this, but maybe @guillaumeeb and @jacobtomlinson have some thoughts on this?

Hi @etejedor,

As you’re probably aware, we currently start the Scheduler directly into the same process as where we build the JobQueueCluster object. Dask jobqueue is less advanced as dask-kubernetes here. So I guess Scheduler would inherit the environment of the local process. You can also pass some configuration/options to the scheduler through scheduler_options kwarg, I don’t know is this is sufficient for your needs. You should be able to configure those also in a config Yaml file.

The Scheduler is instantiated SpecCluster directly, we only pass the base Scheduler class to it here: dask-jobqueue/core.py at main · dask/dask-jobqueue · GitHub.

The best way to improve that would probably to implement Allow to start a Scheduler in a batch job · Issue #186 · dask/dask-jobqueue · GitHub.

But any other suggestion or PR would be welcomed :slight_smile: !

1 Like

Hi @guillaumeeb ,

Thank you for the answer, scheduler_options does not provide any environment setting option atm unfortunately. I agree that a good solution would be to be able to start the scheduler as a batch job too, just like the workers, with its own environment (and not the one of its local parent process). Is this a development that might be merged soon or you would need a contributor?

Sorry to hear that. Couldn’t you try to use also dask config files? Put the appropriate yaml file into the appropriate directory into your home? See maybe:
https://docs.dask.org/en/stable/configuration.html

No body working on it as far a I know. A contribution here would be really appreciated. You can also propose something simpler to at least customize the Scheduler a bit more, even when starting it in the local process.

Couldn’t you try to use also dask config files? Put the appropriate yaml file into the appropriate directory into your home? See maybe:
Configuration — Dask documentation

Thank you, that led me to find this:

https://docs.dask.org/en/latest/how-to/customize-initialization.html#configuration

which seems I could use to set some environment for the scheduler. I’ll give it a try!

Just a final question: whatever I write in preload only affects the scheduler process, but not its parent?

I have really no idea :smile:. But I would say that the Scheduler will be by default executed in the same Linux process than the Jupyter server in your case. If this is it, then if you set some environment variable in the Scheduler preload, you might well also modify the environment of the Jupyter server.

hi @etejedor , did you get around to solving this eventually? I have a similar situation with a different kind of cluster.