Best practice for shutting down a cluster

rjplevin · December 9, 2022, 7:59pm

I’ve got a LocalCluster working well except for teardown. I’ve tried a number of functions, including:

client.close()
client.shutdown()
client.scheduler.shutdown()
client.retire_workers()
doing nothing

But I’ve found no combination that avoids a variety of errors on exit. All the examples I’ve found are interactive ipynb files that don’t ever terminate the cluster. So basic question:

What is the best practice for cleanly terminating a client / cluster?

Thanks.

guillaumeeb · January 16, 2023, 2:32pm

Hi @rjplevin,

This is also something that I’ve experienced, and I’ve no good advice. I’d say the better to use is client.shutdown(), but it’s true in some workload you can get random errors… In general, you can safely ignore them, but I agree this isn’t really clean.

byrom771 · February 16, 2024, 3:00pm

Is there any way to supress these errors or redirect them somewhere else?
I’ve tried:
- Redirecting stderr and stdout before calling client.shutdown();
- Changing the logging.distributed in the dask.config Configuration dictionary to error level;
- Using client.register_worker_callbacks() to call a function that silences logging everytime a worker is created.
However, I can’t seem to ever silence the WARNING messages I get when I shutdown the client. I am building an application using Dask and I would like the users to not see these “alarming” messages if they do not have to.

guillaumeeb · February 16, 2024, 9:14pm

Hi @byrom771, welcome here!

Could you give us the errors you are seeing?

byrom771 · March 4, 2024, 6:11pm

Hello @guillaumeeb ,
I am getting messages like this:
2024-03-04 18:02:56,695 - distributed.worker.state_machine - WARNING - Async instruction for <Task cancelled name=“execute(‘sample_refine_output-8f067a36-aa35-4f72-885e-dac1f39df7a9’)” coro=<Worker.execute() done, defined at pathtoenv/lib/python3.10/site-packages/distributed/worker_state_machine.py:3615>> ended with CancelledError
2024-03-04 18:02:59,891 - distributed.nanny - WARNING - Worker process still alive after 3.1999992370605472 seconds, killing
Several of each type, after I try to shutdown my client by doing client.cancel(futures) (where futures is the list of futures I have running) followed by client.shutdown().
So far the only way I have of suppressing these errors is to set the logging level to CRITICAL (with logging.getLogger(‘distributed’).setLevel(logging.CRITICAL)), both in the workers (by using client.register_worker_callbacks and calling a function that sets logging level) and in the client process (by simply setting the logging level to CRITICAL before I call client cancel and shutdown).
Is there any way to redirect these logging messages from the distributed logger to a file?
Is it possible to simply remove the console handler and add a file handler to the distributed logger manually? I have tried with no success, maybe I just don’t understand how the Python logging module works.
Should this be done through the dask config instead?
I appreciate the assistance.

guillaumeeb · March 6, 2024, 9:35am

Those errors look normal in your case.

If you want to hide them, yes, you should configure logging, this can be done using the dask config: Debug — Dask documentation.

Do not hesitate to share your configuration if you come up with something, this could help other users.

byrom771 · March 13, 2024, 5:54pm

I tried creating a config.yaml file in {sys.prefix}/etc/dask/ and calling dask.refresh(), but that didn’t work for certain warnings so in the end I simply did this:

from distributed.utils import silence_logging_cmgr
with silence_logging_cmgr(logging.CRITICAL):
client.cancel(futures)
client.shutdown()

Which resulted in a sucessful supression of messages below CRITICAL logging level when shutting down the client.

Topic		Replies	Views
Get performance metrics after script completion Distributed distributed	2	164	October 14, 2023
Trying to shutdown workers with completed tasks in order to reduce costs Distributed	7	198	October 4, 2024
Memory Management of Dask Cluster and a few new user questions Distributed distributed	15	1461	March 13, 2024
Scheduler keeps running when client disconnects Distributed	2	654	August 8, 2022
Disable warning messages Distributed distributed , logging	7	2552	April 11, 2024

Best practice for shutting down a cluster

Related topics