Exception: Tried sending message after closing. Status: closed

Long running simulation crashed with this error/stack trace:

Traceback (most recent call last):
File “/home/jurgen/AppsPy/mtdcovabm/simulator/cn_dist.py”, line 176, in cn_distributed
future = client.submit(cn_worker, params, workers=worker_url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/jurgen/AppsPy/mtdcovabm/lib/python3.11/site-packages/distributed/client.py”, line 1961, in submit
futures = self._graph_to_futures(
^^^^^^^^^^^^^^^^^^^^^^^
File “/home/jurgen/AppsPy/mtdcovabm/lib/python3.11/site-packages/distributed/client.py”, line 3171, in _graph_to_futures
self._send_to_scheduler(
File “/home/jurgen/AppsPy/mtdcovabm/lib/python3.11/site-packages/distributed/client.py”, line 1242, in _send_to_scheduler
raise Exception(
Exception: Tried sending message after closing. Status: closed
Message: {‘op’: ‘update-graph’, ‘graph_header’: {‘serializer’: ‘pickle’, ‘writeable’: ()}, ‘graph_frames’: [PICKLED_OBJECT_HERE], ‘keys’: [‘cn_worker-4de6600d7f6dfc2282a0af8550b78310’], ‘internal_priority’: {‘cn_worker-4de6600d7f6dfc2282a0af8550b78310’: 0}, ‘submitting_task’: None, ‘fifo_timeout’: ‘100 ms’, ‘actors’: False, ‘code’: <ToPickle: ()>, ‘annotations’: <ToPickle: {}>}

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/home/jurgen/AppsPy/mtdcovabm/simulator/sim.py”, line 1421, in main
cn_dist.cn_distributed(client,
File “/home/jurgen/AppsPy/mtdcovabm/simulator/cn_dist.py”, line 54, in cn_distributed
with performance_report(filename=dask_perf_log_file_name):
File “/home/jurgen/AppsPy/mtdcovabm/lib/python3.11/site-packages/distributed/client.py”, line 6052, in exit
client = get_client()
^^^^^^^^^^^^
File “/home/jurgen/AppsPy/mtdcovabm/lib/python3.11/site-packages/distributed/worker.py”, line 2793, in get_client
raise ValueError(“No global client found and no address provided”)
ValueError: No global client found and no address provided

The first error is: Tried sending message after closing. Status: closed

Second error is: No global client found and no address provided

I think the second error only happened because of the first.

def _send_to_scheduler(self, msg):
        if self.status in ("running", "closing", "connecting", "newly-created"):
            self.loop.add_callback(self._send_to_scheduler_safe, msg)
        else:
            raise Exception(
                "Tried sending message after closing.  Status: %s\n"
                "Message: %s" % (self.status, msg)
            )

This code excerpt from the Dask Distributed “client” source code seems to indicate that the scheduler was not in a running state.

I am using the client.submit function to start a task on 60 remote workers. Previously, I had issues with trying to explicitly assign specific tasks to specific workers. I am now allowing Dask to take care of allocation (and re-allocation) of tasks.

I have some code that detects the number of workers currently available, and tells me when the number of workers changes. This log is appearing rather regularly (i.e. the number of workers are changing hundreds of times).

Is this a known error? Why would the scheduler drop without notice? Is this almost likely memory related?

It looks like the Client object has been closed, more than the Scheduler. Do you call client.close() somewhere in your code?

hi @guillaumeeb. I do but only at the very end (either after program completion or crashing) whereas the stack trace happened during runtime, and way before I call client.shutdown(). (btw it is client.shutdown() and not client.close(), if it makes any difference). when else would the client be “lost”?

It’s really hard to tell what could be the reason of a Client closing unexpectedly. Could be a memory problem on the main process, or the main process dying unexpectedly, but this is really a wild guess. Are you seeing other logs that could be interesting?

Are you running into this every time you perform this long running simulation?