Hello,
Is nannies’ logs the same as the Scheduler log (by clicking Info in the dashboard)?
No, it’s the stdout/stderr of the dask-worker
bash command
How should I interpret this message mentioning one of the tasks?
2023-08-25 18:12:02,717 - distributed.scheduler - ERROR - Shut down workers that don't have promised key: [], optimize_covparam_vel-4b4d7e48-7020-40c9-9bd7-ba4a2cc610b2 NoneType: None
This is an issue, that has been solved in 2023.8.1 (gather() should not remove unresponsive workers · Issue #7995 · dask/distributed · GitHub), where the scheduler would erroneously shut down a worker that’s temporarily unresponsive - typically because its GIL is locked - thus losing all contents within.
I would advise to retry with the latest version of dask and see if the problem persists.
Is there a way I could print all workers logs simultaneously, so that I can confirm the workers do/dont log anything different than in the example below?
This question is specific to dask-gateway-cluster and I’m afraid I’m not familiar with it. I would be surprised if the system didn’t offer anything for centralized logs collection.