How do I avoid distributed.client - WARNING - Couldn't gather keys, rescheduling?

Hello,

Is nannies’ logs the same as the Scheduler log (by clicking Info in the dashboard)?

No, it’s the stdout/stderr of the dask-worker bash command

How should I interpret this message mentioning one of the tasks?
2023-08-25 18:12:02,717 - distributed.scheduler - ERROR - Shut down workers that don't have promised key: [], optimize_covparam_vel-4b4d7e48-7020-40c9-9bd7-ba4a2cc610b2 NoneType: None

This is an issue, that has been solved in 2023.8.1 (gather() should not remove unresponsive workers · Issue #7995 · dask/distributed · GitHub), where the scheduler would erroneously shut down a worker that’s temporarily unresponsive - typically because its GIL is locked - thus losing all contents within.
I would advise to retry with the latest version of dask and see if the problem persists.

Is there a way I could print all workers logs simultaneously, so that I can confirm the workers do/dont log anything different than in the example below?

This question is specific to dask-gateway-cluster and I’m afraid I’m not familiar with it. I would be surprised if the system didn’t offer anything for centralized logs collection.

1 Like