How to retry hanging jobs during a distributed computation

I’ve been using dask to work with a very large array without loading it into memory, and it mostly works well for that. But for some reason I can’t figure out, it will sometimes entirely stop, indefinitely, and I’m not sure how to fix it without restarting the (often quite long) computation from the beginning.

In essence: I have a hierarchical clustering over the elements of this array, and I am running a lot of different tests (comparing siblings in the tree). I use dask.array to load in the relevant data whenever I’m looking at specific leaves of the tree. So it’s loading data into memory, computing statistics on it, and then loading something else and so on.

Sometimes I’ll start one of these jobs off overnight but when I come back it’s just stuck, not making any progress, with something like this:

Sometimes it unsticks itself (after an indeterminate time) but it wastes hours in this state. Is there any way to metaphorically kick Dask so it starts up again and continues to make progress?

@jamestwebber what version of distributed are you using?

This looks like a deadlock. There are a few known deadlocks right now: Issues · dask/distributed · GitHub.

Seeing logs from the stuck workers might help diagnose this.

This might be worth opening an issue to discuss. It would certainly be helpful to have, but I’m not sure off the top of my head how to implement it.

1 Like

@jamestwebber what version of distributed are you using?

I’ve seen this on two instances recently, running 2021.11.1 and 2022.2.0.

This looks like a deadlock. There are a few known deadlocks right now: Issues · dask/distributed · GitHub.

Seeing logs from the stuck workers might help diagnose this.

Things are working right now but the next time I see this I’ll take a look at the logs.

This might be worth opening an issue to discuss. It would certainly be helpful to have, but I’m not sure off the top of my head how to implement it.

Yeah I don’t know how it might work either. This seems tricky (impossible?) to diagnose automatically. In my case I found that I could get things moving by scaling down and then up again, which seemed to restart a bunch of tasks.

Letting the cluster autoscale might be helping a little too, but it’s hard to quantify if it’s really preventing deadlocks or just feels like it. It would be interesting to look at the logic behind autoscaling and see if it could help here.

As you point out, there are a bunch of open issues on deadlocks, so I don’t know if another one is necessary :joy: But if I see anything interesting in the logs I will

Okay it happened again so here is the last few hundred lines from the log. I see a lot of “unexpected worker completed task” messages. At the very end you can see me scale down to 0 and back up.

Worker logs
...
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:44889', name: 30790, status: closed, memory: 0, processing: 31>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:44889
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:35289', name: 30791, status: closed, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:35289
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:32955', name: 30788, status: closed, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:32955
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:36325', name: 30792, status: closed, memory: 0, processing: 121>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:36325
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 163>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:41547
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:45549', name: 30789, status: closed, memory: 0, processing: 44>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:45549
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:44849', name: 30787, status: closed, memory: 0, processing: 163>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:44849
distributed.scheduler - INFO - Retire worker names (30787, 30789, 30790, 30792, 30793, 30771)
distributed.scheduler - INFO - Retiring worker tcp://127.0.0.1:42213
distributed.scheduler - INFO - Retiring worker tcp://127.0.0.1:45549
distributed.scheduler - INFO - Retiring worker tcp://127.0.0.1:44889
distributed.scheduler - INFO - Retiring worker tcp://127.0.0.1:36325
distributed.scheduler - INFO - Retiring worker tcp://127.0.0.1:44849
distributed.scheduler - INFO - Retiring worker tcp://127.0.0.1:41547
distributed.scheduler - INFO - Closing worker tcp://127.0.0.1:44849
distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:44849', name: 30787, status: closed, memory: 0, processing: 163>
distributed.scheduler - INFO - Retired worker tcp://127.0.0.1:44849
distributed.scheduler - INFO - Closing worker tcp://127.0.0.1:41547
distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 163>
distributed.scheduler - INFO - Retired worker tcp://127.0.0.1:41547
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:35677', name: 30781, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:44849', name: 30787, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 4, 2)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:33651', name: 30775, status: closed, memory: 0, processing: 3>, Got: <WorkerState 'tcp://127.0.0.1:44849', name: 30787, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 325, 0)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:37715', name: 30765, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:44849', name: 30787, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 141, 2)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:35289', name: 30791, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:44849', name: 30787, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 306, 0)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:44849', name: 30787, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 204, 2)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:35677', name: 30781, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:44849', name: 30787, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 277, 0)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:33651', name: 30775, status: closed, memory: 0, processing: 3>, Got: <WorkerState 'tcp://127.0.0.1:44849', name: 30787, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 64, 0)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:37715', name: 30765, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:44849', name: 30787, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 268, 2)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:35289', name: 30791, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:44849', name: 30787, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 63, 3)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:44849', name: 30787, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 300, 2)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:35677', name: 30781, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:44849', name: 30787, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 171, 4)
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:44849', name: 30787, status: closed, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:44849
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:35289', name: 30791, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 293, 0)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:35677', name: 30781, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 70, 4)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:33651', name: 30775, status: closed, memory: 0, processing: 3>, Got: <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 60, 3)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:37715', name: 30765, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 278, 3)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 266, 3)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:35289', name: 30791, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 179, 3)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:35677', name: 30781, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 106, 2)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:33651', name: 30775, status: closed, memory: 0, processing: 3>, Got: <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 43, 4)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:37715', name: 30765, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 295, 3)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 53, 5)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:35289', name: 30791, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 63, 4)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:35677', name: 30781, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 29, 2)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:33651', name: 30775, status: closed, memory: 0, processing: 3>, Got: <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 82, 5)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:37715', name: 30765, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 65, 5)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 286, 5)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:35289', name: 30791, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 252, 0)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:35677', name: 30781, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 236, 0)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:33651', name: 30775, status: closed, memory: 0, processing: 3>, Got: <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 70, 3)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:37715', name: 30765, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 321, 2)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 10, 0)
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:41547
distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:41547', name: 30793, status: closed, memory: 0, processing: 13>
distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:44849', name: 30787, status: closed, memory: 0, processing: 0>
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:37715', name: 30765, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 295, 3)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:37715', name: 30765, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 65, 5)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:35677', name: 30781, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 106, 2)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:37715', name: 30765, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 321, 2)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:35677', name: 30781, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 29, 2)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:37715', name: 30765, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 278, 3)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:35677', name: 30781, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 236, 0)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:35677', name: 30781, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 70, 4)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:33651', name: 30775, status: closed, memory: 0, processing: 3>, Got: <WorkerState 'tcp://127.0.0.1:35289', name: 30791, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 179, 3)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:33651', name: 30775, status: closed, memory: 0, processing: 3>, Got: <WorkerState 'tcp://127.0.0.1:35289', name: 30791, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 293, 0)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:33651', name: 30775, status: closed, memory: 0, processing: 3>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 64, 0)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:33651', name: 30775, status: closed, memory: 0, processing: 3>, Got: <WorkerState 'tcp://127.0.0.1:35289', name: 30791, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 252, 0)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:33651', name: 30775, status: closed, memory: 0, processing: 3>, Got: <WorkerState 'tcp://127.0.0.1:35289', name: 30791, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 63, 4)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:37715', name: 30765, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 141, 2)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:33651', name: 30775, status: closed, memory: 0, processing: 3>, Got: <WorkerState 'tcp://127.0.0.1:37715', name: 30765, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 268, 2)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:33651', name: 30775, status: closed, memory: 0, processing: 3>, Got: <WorkerState 'tcp://127.0.0.1:35677', name: 30781, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 277, 0)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:33651', name: 30775, status: closed, memory: 0, processing: 3>, Got: <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 300, 2)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:35677', name: 30781, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 171, 4)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:35677', name: 30781, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 4, 2)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:33651', name: 30775, status: closed, memory: 0, processing: 3>, Got: <WorkerState 'tcp://127.0.0.1:35289', name: 30791, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 306, 0)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:35289', name: 30791, status: closed, memory: 0, processing: 0>, Key: ('astype-getitem-ccc06e36c4b3ee76df50e394ab84b248', 63, 3)
distributed.scheduler - INFO - Unexpected worker completed task. Expected: <WorkerState 'tcp://127.0.0.1:35677', name: 30781, status: closed, memory: 0, processing: 0>, Got: <WorkerState 'tcp://127.0.0.1:33651', name: 30775, status: closed, memory: 0, processing: 3>, Key: ('getitem-ccc06e36c4b3ee76df50e394ab84b248', 171, 4)
distributed.scheduler - INFO - Closing worker tcp://127.0.0.1:36325
distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:36325', name: 30792, status: closed, memory: 0, processing: 121>
distributed.scheduler - INFO - Retired worker tcp://127.0.0.1:36325
distributed.scheduler - INFO - Closing worker tcp://127.0.0.1:42213
distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:42213', name: 30771, status: closed, memory: 0, processing: 19>
distributed.scheduler - INFO - Retired worker tcp://127.0.0.1:42213
distributed.scheduler - INFO - Closing worker tcp://127.0.0.1:44889
distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:44889', name: 30790, status: closed, memory: 0, processing: 31>
distributed.scheduler - INFO - Retired worker tcp://127.0.0.1:44889
distributed.scheduler - INFO - Closing worker tcp://127.0.0.1:45549
distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:45549', name: 30789, status: closed, memory: 0, processing: 44>
distributed.scheduler - INFO - Retired worker tcp://127.0.0.1:45549
distributed.scheduler - INFO - Retire worker names (30788, 30781)
distributed.scheduler - INFO - Retiring worker tcp://127.0.0.1:32955
distributed.scheduler - INFO - Retiring worker tcp://127.0.0.1:35677
distributed.scheduler - INFO - Closing worker tcp://127.0.0.1:32955
distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:32955', name: 30788, status: closed, memory: 0, processing: 0>
distributed.scheduler - INFO - Retired worker tcp://127.0.0.1:32955
distributed.scheduler - INFO - Closing worker tcp://127.0.0.1:35677
distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:35677', name: 30781, status: closed, memory: 0, processing: 0>
distributed.scheduler - INFO - Retired worker tcp://127.0.0.1:35677
distributed.scheduler - INFO - Retire worker names (30778, 30765, 30791)
distributed.scheduler - INFO - Retiring worker tcp://127.0.0.1:37715
distributed.scheduler - INFO - Retiring worker tcp://127.0.0.1:35289
distributed.scheduler - INFO - Retiring worker tcp://127.0.0.1:44215
distributed.scheduler - INFO - Closing worker tcp://127.0.0.1:37715
distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:37715', name: 30765, status: closed, memory: 0, processing: 0>
distributed.scheduler - INFO - Retired worker tcp://127.0.0.1:37715
distributed.scheduler - INFO - Closing worker tcp://127.0.0.1:35289
distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:35289', name: 30791, status: closed, memory: 0, processing: 0>
distributed.scheduler - INFO - Retired worker tcp://127.0.0.1:35289
distributed.scheduler - INFO - Closing worker tcp://127.0.0.1:44215
distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:44215', name: 30778, status: closed, memory: 0, processing: 0>
distributed.scheduler - INFO - Retired worker tcp://127.0.0.1:44215
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:35877', name: 30793, status: closed, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:35877
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:39159', name: 30794, status: closed, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:39159
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:46251', name: 30795, status: closed, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:46251
distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:35877', name: 30793, status: closed, memory: 0, processing: 0>
distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:39159', name: 30794, status: closed, memory: 0, processing: 0>
distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:46251', name: 30795, status: closed, memory: 0, processing: 0>
distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:33651', name: 30775, status: closed, memory: 0, processing: 3>
distributed.scheduler - INFO - Lost all workers
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:43405', name: 30798, status: running, memory: 1, processing: 824>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:43405
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:42327', name: 30795, status: running, memory: 0, processing: 751>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:42327
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:39483', name: 30796, status: running, memory: 0, processing: 329>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:39483
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:43739', name: 30797, status: running, memory: 64, processing: 9>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:43739
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:40733', name: 30800, status: running, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:40733
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:39325', name: 30802, status: running, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:39325
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:44535', name: 30804, status: running, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:44535
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:42697', name: 30805, status: running, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:42697
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:36737', name: 30803, status: undefined, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:36737
1 Like