Fargate workers are dying with "Failed to deserialize"

I’ve just posted an issue about my workers being killed, but upon further investigation of the scheduler and worker logs, I think my tasks aren’t being executed at all. It seems my workers are dying. Below are parts of the logs for the scheduler and workers. What could be causing this pickle failure?

Scheduler:

| 1639575438163 | distributed.scheduler - INFO - Register worker <WorkerState 'tcp://172.31.7.150:45137', name: 0, status: undefined, memory: 0, processing: 0>              |
| 1639575438164 | distributed.scheduler - INFO - Starting worker compute stream, tcp://172.31.7.150:45137                                                                    |
| 1639575438164 | distributed.core - INFO - Starting established connection                                                                                                  |
| 1639575438188 | distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://172.31.7.150:45137', name: 0, status: running, memory: 0, processing: 196>                |
| 1639575438188 | distributed.core - INFO - Removing comms to tcp://172.31.7.150:45137                                                                                       |
| 1639575438192 | distributed.scheduler - INFO - Lost all workers                                                                                                            |

Worker:

| 1639575438165 | distributed.core - INFO - Starting established connection                                                                                                                                                                                                                                                                                                                   |
| 1639575438184 | distributed.protocol.pickle - INFO - Failed to deserialize <memory at 0x7f667c8e4c40>                                                                                                                                                                                                                                                                                       |
| 1639575438184 | Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                          |
| 1639575438184 |   File "/usr/local/lib/python3.9/site-packages/distributed/protocol/pickle.py", line 75, in loads                                                                                                                                                                                                                                                                           |
| 1639575438184 |     return pickle.loads(x)                                                                                                                                                                                                                                                                                                                                                  |
| 1639575438184 |   File "/usr/local/lib/python3.9/site-packages/cloudpickle/cloudpickle.py", line 851, in _make_skeleton_class                                                                                                                                                                                                                                                               |
| 1639575438184 |     skeleton_class = types.new_class(                                                                                                                                                                                                                                                                                                                                       |
| 1639575438184 |   File "/usr/local/lib/python3.9/types.py", line 77, in new_class                                                                                                                                                                                                                                                                                                           |
| 1639575438184 |     return meta(name, resolved_bases, ns, **kwds)                                                                                                                                                                                                                                                                                                                           |
| 1639575438184 |   File "/usr/local/lib/python3.9/typing.py", line 1879, in __new__                                                                                                                                                                                                                                                                                                          |
| 1639575438184 |     module=ns['__module__'])                                                                                                                                                                                                                                                                                                                                                |
| 1639575438184 | KeyError: '__module__'                                                                                                                                                                                                                                                                                                                                                      |
| 1639575438186 | distributed.protocol.core - CRITICAL - Failed to deserialize                                                                                                                                                                                                                                                                                                                |

Thanks @ian.liu88! Noting here the associated github issue.

1 Like