Dask scheduler in a docker container, workers as HTCondor jobs

etejedor · February 24, 2022, 10:06am

Hello,

At CERN we have a Jupyter notebook service that we are now integrating with HTCondor resources, and we would like to use those resources via Dask.

The setup is the following: users log in to the notebook service and get a user session, which runs in a Docker container. Inside their session, users should be able to create a Dask HTCondorCluster to deploy Dask workers on our HTCondor pool. The problem we have is that the address that the scheduler binds to can’t be the same as the address workers use to contact the scheduler. The scheduler runs inside the container, and should listen on an address:port of the private network of the container. However, the workers (which are running in another network in the HTCondor pool) should contact the scheduler on an address:port of the node that hosts the user container, for which we would setup port forwarding to reach the container.

So far we haven’t found any way for the workers to receive a different scheduler address than the address the scheduler binds to. We found this:

but that only allows to specify a different address for the client to contact the scheduler (i.e. the scheduler must still bind to the same address that the workers receive).

What would be the way to configure a setup like the one I just described and make it possible for workers to connect to the scheduler?

Thank you,

Enric

oshadura · February 24, 2022, 1:36pm

Hi @etejedor, we were working on a similar setup for analysis facility at University Lincoln-Nebraska (cc @bbockelm), and I believe you could be interested in my PR Preserve worker hostname by oshadura · Pull Request #4938 · dask/distributed · GitHub, I will try to wrap up it next week.

etejedor · February 24, 2022, 1:43pm

Thank you for sharing @oshadura , but I believe that patch is not strictly related to the issue I described above (I’d like to find a solution for workers to be told a different scheduler address than the one the scheduler binds to).

scharlottej13 · February 25, 2022, 2:11am

Hi @etejedor and welcome to discourse! At the moment, dask-jobqueue unfortunately doesn’t support this, but I would recommend opening an issue there. Depending on your notebook server environment, you might be able to use batchspawner. Using dask-gateway might be another option as well.

oshadura · February 25, 2022, 9:04am

@etejedo I think we have also other patch that could be useful. I will open as a PR next week https://github.com/CoffeaTeam/coffea-casa/blob/master/docker/coffea-casa-cc7/distributed/0004-Add-possibility-to-setup-external_adress-for-schedul.patch

etejedor · February 25, 2022, 9:19am

Hi @scharlottej13 thank you for you reply and the suggestions, I think opening an issue is probably the best option. Batchspawner is not an option for us since we use the k8s spawner (the notebook servers don’t run on the HTCondor cluster) and dask-gateway is certainly something to keep an eye on, but a simpler setup would be better to start I think.

Would it be better to open the issue on dask-jobqueue or on distributed? The necessary changes would likely imply a new parameter for the scheduler.

etejedor · February 25, 2022, 9:24am

Thanks @oshadura that looks like a possible solution!

It will probably need to be adapted to avoid a clash with Add support for separate external address for SpecCluster scheduler by jacobtomlinson · Pull Request #2963 · dask/distributed · GitHub , which already defines an external_address option with a different meaning (i.e. the address the client uses to connect to the scheduler, not the address the workers use as in your patch).

pavithraes · February 25, 2022, 11:22am

Would it be better to open the issue on dask-jobqueue or on distributed? The necessary changes would likely imply a new parameter for the scheduler.

@etejedor I’d suggest starting the discussion on dask-jobqueue, and perhaps opening a follow-up issue on distributed later based on it – what do you think?

etejedor · February 25, 2022, 12:17pm

That works for me, let me just ping @oshadura since she said she’s going to open a PR with her patch.

etejedor · February 28, 2022, 11:24am

This is now Configure Dask workers to contact scheduler on a specific address · Issue #548 · dask/dask-jobqueue · GitHub

pavithraes · February 28, 2022, 2:04pm

@etejedor Thanks for opening that issue! I think we can continue the discussion there (to avoid duplication), so I’ll mark this Discourse thread as resolved.

Topic		Replies	Views
Setting environment for scheduler Distributed	7	705	September 12, 2023
Deploy dask docker containers over multiple machines Deploying Dask	3	674	August 2, 2023
dask_kubernetes.operator.KubeCluster hard codes scheduler address Distributed kubernetes	4	150	November 22, 2023
Running eactly one task per DASK worker Distributed	1	245	April 22, 2023
Local Cluster with Two Nodes (Desktops) Distributed distributed	1	535	September 21, 2022

Dask scheduler in a docker container, workers as HTCondor jobs

Related topics