Can't connect to local cluster - times out

GDB-SF · December 7, 2021, 9:31pm

New to Dask distributed. I’ve created a local cluster (default settings) and confirmed that it’s running by checking the dashboard. But when I try to create a client with client = Client('127.0.0.1:8787') it almost always times out. If I connect with the default client = Client() I get a warning "UserWarning: Port 8787 is already in use." Furthermore if I ty to call the default client from XGBoost

output = xgb.dask.train(
    client, params, dtrain, num_boost_round=5,
    evals=[(dtrain, 'train')]
)

I get OSError: [Errno 49] Can't assign requested address

ian · December 7, 2021, 10:50pm

Hi @GDB-SF, welcome! 8787 is the default port for the dashboard, but not for the scheduler to talk to clients (that’s 8786). And if either of those are occupied (e.g., if you have another python session lying around somewhere with another cluster running) then you’ll get the user warning, and another port will be chosen.

If you are creating local clusters, I’d recommend passing the cluster instance into the client directly, then you don’t need to worry about getting the port right:

import distributed

cluster = distributed.LocalCluster()  # could customize with different kwargs
client = distributed.Client(cluster)

GDB-SF · December 8, 2021, 4:00am

Hi Ian. Wow - thanks for the quick turnaround! My use case is this: I’m weaning my data science team off Pandas and CSV as we begin working with larger and larger datasets. My intro is - surprise - a Jupyter notebook. The intro uses XGBoost and I can expect them to rerun the notebook as they tinker with the data etc. So I do this:

First, check to see if the cluster already exists: [View cluster status](http://localhost:8787/status) If the cluster does not exist, uncomment the next cell to create one

Then in the next cell I have

# cluster = LocalCluster()
# client = Client(cluster)

The best alternative is for them to be able to connect to a running cluster once they create it. Alternatively I can just add an earlier cell with client.shutdown() I suspect it will seem odd to them to shut down a cluster and recreate it every time they want to run the notebook.

GDB-SF · December 8, 2021, 5:47pm

Going forward Ian what I’m going to do is to dig into the documentation and maybe create another notebook where the user can set up the local cluster with sufficient specificity to allow them to call it from other notebooks. So:

Start with your model training notebook. Check whether the local cluster is activated
If not, then go to the ‘setup’ notebook and create it
Go back to your model training notebook and run your code.

Make sense?

Gcav66 · December 10, 2021, 4:51pm

Hi @GDB-SF - building on Ian’s example, you could use try/except to check for a running local cluster before creating a new one

from distributed import Client, LocalCluster

try:
    client = Client('tcp://localhost:8786', timeout='2s')
except OSError:
    cluster = LocalCluster(scheduler_port=8786)
    client = Client(cluster)
client

One clarifying question - would you ultimately want your data science to work on a shared cluster, e.g., not just local clusters running on their individual workstations?

ian · December 11, 2021, 12:29am

@GDB-SF You might also be interested in trying Dask’s JupyterLab extension: this provides an (optional) graphical UI for launching a cluster which can survive a specific notebook kernel session, as well as makes it possible to embed Dask’s dashboard panes within the notebook environment.

This video shows some of the interactions that are possible:

So a user can create one cluster with this, and connect to it many times across different notebooks, while maintaining the same dashboard layout.

Topic		Replies	Views
Lots of Warnings while launching a cluster	5	1328	March 28, 2023
KubeCluster provisions pod but times out before returning cluster object Deploying Dask	9	707	August 26, 2022
Disable dashboard Distributed	1	109	February 2, 2024
[Best practice] Deploy a cluster on an interactive compute node on a slurm cluster Distributed distributed	2	1142	April 23, 2022
Client does not return workers, Job dies quickly Distributed scheduler	4	378	July 25, 2023

Can't connect to local cluster - times out

Related topics