Long running simulation crashed with this error/stack trace:
Traceback (most recent call last):
File “/home/jurgen/AppsPy/mtdcovabm/simulator/cn_dist.py”, line 176, in cn_distributed
future = client.submit(cn_worker, params, workers=worker_url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/jurgen/AppsPy/mtdcovabm/lib/python3.11/site-packages/distributed/client.py”, line 1961, in submit
futures = self._graph_to_futures(
^^^^^^^^^^^^^^^^^^^^^^^
File “/home/jurgen/AppsPy/mtdcovabm/lib/python3.11/site-packages/distributed/client.py”, line 3171, in _graph_to_futures
self._send_to_scheduler(
File “/home/jurgen/AppsPy/mtdcovabm/lib/python3.11/site-packages/distributed/client.py”, line 1242, in _send_to_scheduler
raise Exception(
Exception: Tried sending message after closing. Status: closed
Message: {‘op’: ‘update-graph’, ‘graph_header’: {‘serializer’: ‘pickle’, ‘writeable’: ()}, ‘graph_frames’: [PICKLED_OBJECT_HERE], ‘keys’: [‘cn_worker-4de6600d7f6dfc2282a0af8550b78310’], ‘internal_priority’: {‘cn_worker-4de6600d7f6dfc2282a0af8550b78310’: 0}, ‘submitting_task’: None, ‘fifo_timeout’: ‘100 ms’, ‘actors’: False, ‘code’: <ToPickle: ()>, ‘annotations’: <ToPickle: {}>}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/home/jurgen/AppsPy/mtdcovabm/simulator/sim.py”, line 1421, in main
cn_dist.cn_distributed(client,
File “/home/jurgen/AppsPy/mtdcovabm/simulator/cn_dist.py”, line 54, in cn_distributed
with performance_report(filename=dask_perf_log_file_name):
File “/home/jurgen/AppsPy/mtdcovabm/lib/python3.11/site-packages/distributed/client.py”, line 6052, in exit
client = get_client()
^^^^^^^^^^^^
File “/home/jurgen/AppsPy/mtdcovabm/lib/python3.11/site-packages/distributed/worker.py”, line 2793, in get_client
raise ValueError(“No global client found and no address provided”)
ValueError: No global client found and no address provided
The first error is: Tried sending message after closing. Status: closed
Second error is: No global client found and no address provided
I think the second error only happened because of the first.
def _send_to_scheduler(self, msg):
if self.status in ("running", "closing", "connecting", "newly-created"):
self.loop.add_callback(self._send_to_scheduler_safe, msg)
else:
raise Exception(
"Tried sending message after closing. Status: %s\n"
"Message: %s" % (self.status, msg)
)
This code excerpt from the Dask Distributed “client” source code seems to indicate that the scheduler was not in a running state.
I am using the client.submit function to start a task on 60 remote workers. Previously, I had issues with trying to explicitly assign specific tasks to specific workers. I am now allowing Dask to take care of allocation (and re-allocation) of tasks.
I have some code that detects the number of workers currently available, and tells me when the number of workers changes. This log is appearing rather regularly (i.e. the number of workers are changing hundreds of times).
Is this a known error? Why would the scheduler drop without notice? Is this almost likely memory related?