"Workers don't have promised key" error and delayed computation

I am attempting to create a somewhat large Dask array (10k x 10k which is ~1GB) many times in parallel using the delayed operator and the distributed scheduler. When crossing a certain threshold of iterations or array size, I get a “Workers don’t have promised key” error. I’ve tried several variations on this which do not cause this problem, and I would like to understand why the error is occurring in this particular case. Note that the error does not happen when: not using the distributed scheduler, when creating numpy arrays of the same size instead of dask arrays, when using fewer iterations and the same array size, and when using smaller arrays and the same number of iterations. I am using Dask version 2022.05.0

import dask.array as da
import dask
from dask.distributed import Client
import numpy as np

client = Client()

def repeat_func(nb_iters, arr_sz):
    def func():
        x = da.random.random((arr_sz, arr_sz)).compute()
        # The following works
        # x = np.random.random((arr_sz, arr_sz))
        del x

    results = [dask.delayed(func)() for _ in range(nb_iters)]
    return dask.compute(results)

# The following works.
# repeat_func(10, 10_000)
# The following works.
# repeat_func(80, 1_000)

# Fails with KeyError when nb_iters crosses the threshold from 40 to 80.
# This works when not using the distributed client.
repeat_func(80, 10_000)
Output
2022-08-02 16:06:50,581 - distributed.scheduler - ERROR - Couldn't gather keys {"('random_sample-75e2a3af25475b41ff19bdea11f02de1', 2, 0)": ['tcp://127.0.0.1:45339'], "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 1, 3)": ['tcp://127.0.0.1:45339'], "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 0, 1)": ['tcp://127.0.0.1:45339'], "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 3, 0)": ['tcp://127.0.0.1:45339'], "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 0, 0)": ['tcp://127.0.0.1:45339'], "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 1, 2)": ['tcp://127.0.0.1:45339'], "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 2, 2)": ['tcp://127.0.0.1:45339'], "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 0, 2)": ['tcp://127.0.0.1:45339'], "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 2, 1)": ['tcp://127.0.0.1:45339'], "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 3, 3)": ['tcp://127.0.0.1:45339'], "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 1, 1)": ['tcp://127.0.0.1:45339'], "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 1, 0)": ['tcp://127.0.0.1:45339'], "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 3, 2)": ['tcp://127.0.0.1:45339'], "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 0, 3)": ['tcp://127.0.0.1:45339'], "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 3, 1)": ['tcp://127.0.0.1:45339'], "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 2, 3)": ['tcp://127.0.0.1:45339']} state: ['processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing'] workers: ['tcp://127.0.0.1:45339']
NoneType: None
2022-08-02 16:06:50,621 - distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://127.0.0.1:45339'], ('random_sample-75e2a3af25475b41ff19bdea11f02de1', 2, 0)
NoneType: None
2022-08-02 16:06:50,623 - distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://127.0.0.1:45339'], ('random_sample-75e2a3af25475b41ff19bdea11f02de1', 1, 3)
NoneType: None
2022-08-02 16:06:50,627 - distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://127.0.0.1:45339'], ('random_sample-75e2a3af25475b41ff19bdea11f02de1', 0, 1)
NoneType: None
2022-08-02 16:06:50,629 - distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://127.0.0.1:45339'], ('random_sample-75e2a3af25475b41ff19bdea11f02de1', 3, 0)
NoneType: None
2022-08-02 16:06:50,631 - distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://127.0.0.1:45339'], ('random_sample-75e2a3af25475b41ff19bdea11f02de1', 0, 0)
NoneType: None
2022-08-02 16:06:50,632 - distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://127.0.0.1:45339'], ('random_sample-75e2a3af25475b41ff19bdea11f02de1', 1, 2)
NoneType: None
2022-08-02 16:06:50,634 - distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://127.0.0.1:45339'], ('random_sample-75e2a3af25475b41ff19bdea11f02de1', 2, 2)
NoneType: None
2022-08-02 16:06:50,636 - distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://127.0.0.1:45339'], ('random_sample-75e2a3af25475b41ff19bdea11f02de1', 0, 2)
NoneType: None
2022-08-02 16:06:50,637 - distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://127.0.0.1:45339'], ('random_sample-75e2a3af25475b41ff19bdea11f02de1', 2, 1)
NoneType: None
2022-08-02 16:06:50,637 - distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://127.0.0.1:45339'], ('random_sample-75e2a3af25475b41ff19bdea11f02de1', 3, 3)
NoneType: None
2022-08-02 16:06:50,638 - distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://127.0.0.1:45339'], ('random_sample-75e2a3af25475b41ff19bdea11f02de1', 1, 1)
NoneType: None
2022-08-02 16:06:50,638 - distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://127.0.0.1:45339'], ('random_sample-75e2a3af25475b41ff19bdea11f02de1', 1, 0)
NoneType: None
2022-08-02 16:06:50,640 - distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://127.0.0.1:45339'], ('random_sample-75e2a3af25475b41ff19bdea11f02de1', 3, 2)
NoneType: None
2022-08-02 16:06:50,640 - distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://127.0.0.1:45339'], ('random_sample-75e2a3af25475b41ff19bdea11f02de1', 0, 3)
NoneType: None
2022-08-02 16:06:50,641 - distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://127.0.0.1:45339'], ('random_sample-75e2a3af25475b41ff19bdea11f02de1', 3, 1)
NoneType: None
2022-08-02 16:06:50,649 - distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://127.0.0.1:45339'], ('random_sample-75e2a3af25475b41ff19bdea11f02de1', 2, 3)
NoneType: None
2022-08-02 16:06:50,651 - distributed.nanny - WARNING - Restarting worker
2022-08-02 16:06:50,650 - distributed.client - WARNING - Couldn't gather 16 keys, rescheduling {"('random_sample-75e2a3af25475b41ff19bdea11f02de1', 2, 0)": ('tcp://127.0.0.1:45339',), "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 1, 3)": ('tcp://127.0.0.1:45339',), "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 0, 1)": ('tcp://127.0.0.1:45339',), "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 3, 0)": ('tcp://127.0.0.1:45339',), "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 0, 0)": ('tcp://127.0.0.1:45339',), "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 1, 2)": ('tcp://127.0.0.1:45339',), "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 2, 2)": ('tcp://127.0.0.1:45339',), "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 0, 2)": ('tcp://127.0.0.1:45339',), "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 2, 1)": ('tcp://127.0.0.1:45339',), "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 3, 3)": ('tcp://127.0.0.1:45339',), "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 1, 1)": ('tcp://127.0.0.1:45339',), "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 1, 0)": ('tcp://127.0.0.1:45339',), "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 3, 2)": ('tcp://127.0.0.1:45339',), "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 0, 3)": ('tcp://127.0.0.1:45339',), "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 3, 1)": ('tcp://127.0.0.1:45339',), "('random_sample-75e2a3af25475b41ff19bdea11f02de1', 2, 3)": ('tcp://127.0.0.1:45339',)}
2022-08-02 16:08:34,821 - distributed.worker_memory - WARNING - Worker exceeded 95% memory budget. Restarting
2022-08-02 16:08:35,005 - distributed.scheduler - ERROR - Couldn't gather keys {"('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 1, 0)": [], "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 2, 3)": [], "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 1, 2)": [], "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 0, 2)": [], "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 1, 3)": [], "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 2, 0)": [], "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 3, 1)": [], "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 2, 2)": [], "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 1, 1)": [], "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 2, 1)": [], "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 3, 2)": [], "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 0, 3)": [], "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 0, 1)": [], "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 3, 0)": [], "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 0, 0)": []} state: ['processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing'] workers: []
NoneType: None
2022-08-02 16:08:35,006 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 1, 0)
NoneType: None
2022-08-02 16:08:35,007 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 2, 3)
NoneType: None
2022-08-02 16:08:35,007 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 1, 2)
NoneType: None
2022-08-02 16:08:35,014 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 0, 2)
NoneType: None
2022-08-02 16:08:35,015 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 1, 3)
NoneType: None
2022-08-02 16:08:35,015 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 2, 0)
NoneType: None
2022-08-02 16:08:35,016 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 3, 1)
NoneType: None
2022-08-02 16:08:35,049 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 2, 2)
NoneType: None
2022-08-02 16:08:35,050 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 1, 1)
NoneType: None
2022-08-02 16:08:35,053 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 2, 1)
NoneType: None
2022-08-02 16:08:35,054 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 3, 2)
NoneType: None
2022-08-02 16:08:35,054 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 0, 3)
NoneType: None
2022-08-02 16:08:35,056 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 0, 1)
NoneType: None
2022-08-02 16:08:35,056 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 3, 0)
NoneType: None
2022-08-02 16:08:35,057 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 0, 0)
NoneType: None
2022-08-02 16:08:35,058 - distributed.client - WARNING - Couldn't gather 15 keys, rescheduling {"('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 1, 0)": (), "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 2, 3)": (), "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 1, 2)": (), "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 0, 2)": (), "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 1, 3)": (), "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 2, 0)": (), "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 3, 1)": (), "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 2, 2)": (), "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 1, 1)": (), "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 2, 1)": (), "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 3, 2)": (), "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 0, 3)": (), "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 0, 1)": (), "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 3, 0)": (), "('random_sample-ebb4cf4fc81866ce0245a5c80bb691a1', 0, 0)": ()}
2022-08-02 16:08:35,063 - distributed.nanny - WARNING - Restarting worker
  1. you should not mix dask collections (arrays in this case) inside of delayed functions. Why not just call dask.array from the main code? It already has a lazy API for all operations
  2. your function is not pure, each iteraion gives a different result, and this is where dask is getting confused. You would pass pure=False to the Delayed constructor.
1 Like
  1. Yes, it is rather odd to do an operation on a Dask collection inside a delayed function :slight_smile: I should have provided more context about my ultimate goal, which is to to convert an xArray dataset to a Dask dataframe and then save it to Parquet. I tried doing this in the most straightforward way possible, but ran into a problem which seems like a bug to me. See Notebook crashes after calling .to_dask_dataframe · Issue #6811 · pydata/xarray · GitHub. As a workaround, I am trying to implement a parallel for loop (using Dask delayed) where each iteration grabs a small segment of the xArray dataset, converts it to a dataframe, and then saves it to Parquet. This results in a KeyError, even though the delayed function is pure. I tried to find the most minimal example where this occurred, which resulted in the code I put in my original post above.

  2. Thanks for the tip about using pure=False. I still get a KeyError though, but it seems like it’s for a different reason related to running out of memory (see output below). I don’t understand why I would run out of memory if it works when using a numpy array in func. Is there some way to better understand and/or estimate the extra memory required to use a Dask array (as opposed to a numpy array)?

Output
2022-08-04 19:57:28,453 - distributed.worker_memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 2.55 GiB -- Worker memory limit: 3.82 GiB
2022-08-04 19:57:34,630 - distributed.worker_memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 2.54 GiB -- Worker memory limit: 3.82 GiB
2022-08-04 19:57:39,470 - distributed.worker_memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 2.55 GiB -- Worker memory limit: 3.82 GiB
2022-08-04 19:57:43,118 - distributed.worker_memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 2.50 GiB -- Worker memory limit: 3.82 GiB
2022-08-04 19:57:48,498 - distributed.worker_memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 2.55 GiB -- Worker memory limit: 3.82 GiB
2022-08-04 19:57:53,109 - distributed.worker - ERROR - failed during get data with tcp://127.0.0.1:37159 -> None
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/tornado/iostream.py", line 867, in _read_to_buffer
    bytes_read = self.read_from_fd(buf)
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/tornado/iostream.py", line 1140, in read_from_fd
    return self.socket.recv_into(buf, len(buf))
TimeoutError: [Errno 110] Connection timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/worker.py", line 1728, in get_data
    response = await comm.read(deserializers=serializers)
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/comm/tcp.py", line 242, in read
    convert_stream_closed_error(self, e)
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/comm/tcp.py", line 148, in convert_stream_closed_error
    raise CommClosedError(f"in {obj}: {exc.__class__.__name__}: {exc}") from exc
distributed.comm.core.CommClosedError: in <TCP (closed)  local=tcp://127.0.0.1:37159 remote=tcp://127.0.0.1:60162>: TimeoutError: [Errno 110] Connection timed out
2022-08-04 19:57:53,114 - distributed.worker_memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 2.55 GiB -- Worker memory limit: 3.82 GiB
2022-08-04 19:57:53,601 - distributed.worker_memory - WARNING - Worker is at 89% memory usage. Pausing worker.  Process memory: 3.40 GiB -- Worker memory limit: 3.82 GiB
2022-08-04 19:57:54,161 - distributed.worker_memory - WARNING - Worker is at 80% memory usage. Pausing worker.  Process memory: 3.09 GiB -- Worker memory limit: 3.82 GiB
2022-08-04 19:57:55,022 - distributed.worker_memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 3.18 GiB -- Worker memory limit: 3.82 GiB
2022-08-04 19:57:55,023 - distributed.worker_memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 3.18 GiB -- Worker memory limit: 3.82 GiB
2022-08-04 19:57:55,171 - distributed.worker_memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 3.21 GiB -- Worker memory limit: 3.82 GiB
2022-08-04 19:57:55,510 - distributed.worker_memory - WARNING - Worker is at 46% memory usage. Resuming worker. Process memory: 1.77 GiB -- Worker memory limit: 3.82 GiB
2022-08-04 19:57:55,510 - distributed.worker_memory - WARNING - Worker is at 74% memory usage. Resuming worker. Process memory: 2.85 GiB -- Worker memory limit: 3.82 GiB
2022-08-04 19:58:08,227 - distributed.worker_memory - WARNING - Worker is at 88% memory usage. Pausing worker.  Process memory: 3.37 GiB -- Worker memory limit: 3.82 GiB
2022-08-04 19:58:08,237 - distributed.worker - WARNING - Compute Failed
Key:       func-1db4f75b-6eee-4df3-b2f9-fb4fb94ed973
Function:  func
args:      ()
kwargs:    {}
Exception: "KeyError('data')"

2022-08-04 19:58:08,237 - distributed.worker - WARNING - Compute Failed
Key:       func-2869213f-cd77-4c4a-b58b-4f64bcc402c8
Function:  func
args:      ()
kwargs:    {}
Exception: "KeyError('data')"

2022-08-04 19:58:08,237 - distributed.worker - WARNING - Compute Failed
Key:       func-2223c59b-1ceb-42bc-a357-0c39b6242d31
Function:  func
args:      ()
kwargs:    {}
Exception: "KeyError('data')"

2022-08-04 19:58:09,718 - distributed.nanny - WARNING - Restarting worker
2022-08-04 19:58:11,007 - distributed.worker - ERROR - failed during get data with tcp://127.0.0.1:39017 -> None
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/tornado/iostream.py", line 867, in _read_to_buffer
    bytes_read = self.read_from_fd(buf)
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/tornado/iostream.py", line 1140, in read_from_fd
    return self.socket.recv_into(buf, len(buf))
ConnectionResetError: [Errno 104] Connection reset by peer

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/worker.py", line 1728, in get_data
    response = await comm.read(deserializers=serializers)
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/comm/tcp.py", line 242, in read
    convert_stream_closed_error(self, e)
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/comm/tcp.py", line 148, in convert_stream_closed_error
    raise CommClosedError(f"in {obj}: {exc.__class__.__name__}: {exc}") from exc
distributed.comm.core.CommClosedError: in <TCP (closed)  local=tcp://127.0.0.1:39017 remote=tcp://127.0.0.1:55296>: ConnectionResetError: [Errno 104] Connection reset by peer
2022-08-04 19:58:11,009 - distributed.worker - ERROR - failed during get data with tcp://127.0.0.1:39017 -> None
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/tornado/iostream.py", line 867, in _read_to_buffer
    bytes_read = self.read_from_fd(buf)
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/tornado/iostream.py", line 1140, in read_from_fd
    return self.socket.recv_into(buf, len(buf))
ConnectionResetError: [Errno 104] Connection reset by peer

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/worker.py", line 1728, in get_data
    response = await comm.read(deserializers=serializers)
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/comm/tcp.py", line 242, in read
    convert_stream_closed_error(self, e)
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/comm/tcp.py", line 148, in convert_stream_closed_error
    raise CommClosedError(f"in {obj}: {exc.__class__.__name__}: {exc}") from exc
distributed.comm.core.CommClosedError: in <TCP (closed)  local=tcp://127.0.0.1:39017 remote=tcp://127.0.0.1:55284>: ConnectionResetError: [Errno 104] Connection reset by peer
2022-08-04 19:58:52,405 - distributed.worker_memory - WARNING - Worker is at 89% memory usage. Pausing worker.  Process memory: 3.43 GiB -- Worker memory limit: 3.82 GiB
2022-08-04 19:58:52,661 - distributed.worker_memory - WARNING - Worker is at 62% memory usage. Resuming worker. Process memory: 2.40 GiB -- Worker memory limit: 3.82 GiB
2022-08-04 19:59:10,038 - distributed.worker_memory - WARNING - Worker is at 93% memory usage. Pausing worker.  Process memory: 3.56 GiB -- Worker memory limit: 3.82 GiB
2022-08-04 19:59:10,672 - distributed.worker_memory - WARNING - Worker exceeded 95% memory budget. Restarting
2022-08-04 19:59:11,135 - distributed.scheduler - ERROR - Couldn't gather keys {"('random_sample-baed8db15ef3e3d845b76bab90807593', 1, 3)": [], "('random_sample-baed8db15ef3e3d845b76bab90807593', 0, 2)": [], "('random_sample-baed8db15ef3e3d845b76bab90807593', 1, 2)": [], "('random_sample-baed8db15ef3e3d845b76bab90807593', 2, 3)": [], "('random_sample-baed8db15ef3e3d845b76bab90807593', 2, 1)": [], "('random_sample-baed8db15ef3e3d845b76bab90807593', 0, 1)": [], "('random_sample-baed8db15ef3e3d845b76bab90807593', 2, 0)": [], "('random_sample-baed8db15ef3e3d845b76bab90807593', 3, 0)": [], "('random_sample-baed8db15ef3e3d845b76bab90807593', 0, 0)": [], "('random_sample-baed8db15ef3e3d845b76bab90807593', 2, 2)": [], "('random_sample-baed8db15ef3e3d845b76bab90807593', 1, 0)": [], "('random_sample-baed8db15ef3e3d845b76bab90807593', 3, 1)": [], "('random_sample-baed8db15ef3e3d845b76bab90807593', 3, 2)": [], "('random_sample-baed8db15ef3e3d845b76bab90807593', 0, 3)": [], "('random_sample-baed8db15ef3e3d845b76bab90807593', 3, 3)": [], "('random_sample-baed8db15ef3e3d845b76bab90807593', 1, 1)": []} state: ['processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing'] workers: []
NoneType: None
2022-08-04 19:59:11,136 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-baed8db15ef3e3d845b76bab90807593', 1, 3)
NoneType: None
2022-08-04 19:59:11,136 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-baed8db15ef3e3d845b76bab90807593', 0, 2)
NoneType: None
2022-08-04 19:59:11,138 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-baed8db15ef3e3d845b76bab90807593', 1, 2)
NoneType: None
2022-08-04 19:59:11,139 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-baed8db15ef3e3d845b76bab90807593', 2, 3)
NoneType: None
2022-08-04 19:59:11,141 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-baed8db15ef3e3d845b76bab90807593', 2, 1)
NoneType: None
2022-08-04 19:59:11,142 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-baed8db15ef3e3d845b76bab90807593', 0, 1)
NoneType: None
2022-08-04 19:59:11,142 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-baed8db15ef3e3d845b76bab90807593', 2, 0)
NoneType: None
2022-08-04 19:59:11,146 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-baed8db15ef3e3d845b76bab90807593', 3, 0)
NoneType: None
2022-08-04 19:59:11,146 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-baed8db15ef3e3d845b76bab90807593', 0, 0)
NoneType: None
2022-08-04 19:59:11,147 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-baed8db15ef3e3d845b76bab90807593', 2, 2)
NoneType: None
2022-08-04 19:59:11,147 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-baed8db15ef3e3d845b76bab90807593', 1, 0)
NoneType: None
2022-08-04 19:59:11,148 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-baed8db15ef3e3d845b76bab90807593', 3, 1)
NoneType: None
2022-08-04 19:59:11,149 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-baed8db15ef3e3d845b76bab90807593', 3, 2)
NoneType: None
2022-08-04 19:59:11,159 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-baed8db15ef3e3d845b76bab90807593', 0, 3)
NoneType: None
2022-08-04 19:59:11,160 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-baed8db15ef3e3d845b76bab90807593', 3, 3)
NoneType: None
2022-08-04 19:59:11,160 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-baed8db15ef3e3d845b76bab90807593', 1, 1)
NoneType: None
2022-08-04 19:59:11,161 - distributed.client - WARNING - Couldn't gather 16 keys, rescheduling {"('random_sample-baed8db15ef3e3d845b76bab90807593', 1, 3)": (), "('random_sample-baed8db15ef3e3d845b76bab90807593', 0, 2)": (), "('random_sample-baed8db15ef3e3d845b76bab90807593', 1, 2)": (), "('random_sample-baed8db15ef3e3d845b76bab90807593', 2, 3)": (), "('random_sample-baed8db15ef3e3d845b76bab90807593', 2, 1)": (), "('random_sample-baed8db15ef3e3d845b76bab90807593', 0, 1)": (), "('random_sample-baed8db15ef3e3d845b76bab90807593', 2, 0)": (), "('random_sample-baed8db15ef3e3d845b76bab90807593', 3, 0)": (), "('random_sample-baed8db15ef3e3d845b76bab90807593', 0, 0)": (), "('random_sample-baed8db15ef3e3d845b76bab90807593', 2, 2)": (), "('random_sample-baed8db15ef3e3d845b76bab90807593', 1, 0)": (), "('random_sample-baed8db15ef3e3d845b76bab90807593', 3, 1)": (), "('random_sample-baed8db15ef3e3d845b76bab90807593', 3, 2)": (), "('random_sample-baed8db15ef3e3d845b76bab90807593', 0, 3)": (), "('random_sample-baed8db15ef3e3d845b76bab90807593', 3, 3)": (), "('random_sample-baed8db15ef3e3d845b76bab90807593', 1, 1)": ()}
2022-08-04 19:59:11,164 - distributed.nanny - WARNING - Restarting worker
2022-08-04 20:01:32,222 - distributed.worker_memory - WARNING - Worker exceeded 95% memory budget. Restarting
2022-08-04 20:01:32,322 - distributed.worker_memory - WARNING - Worker exceeded 95% memory budget. Restarting
2022-08-04 20:01:33,673 - distributed.nanny - WARNING - Restarting worker
2022-08-04 20:01:34,328 - distributed.scheduler - ERROR - Couldn't gather keys {"('random_sample-f005a374b4d3468e2a4010e9645ec404', 3, 1)": [], "('random_sample-f005a374b4d3468e2a4010e9645ec404', 2, 3)": [], "('random_sample-f005a374b4d3468e2a4010e9645ec404', 2, 0)": [], "('random_sample-f005a374b4d3468e2a4010e9645ec404', 3, 2)": [], "('random_sample-f005a374b4d3468e2a4010e9645ec404', 1, 3)": [], "('random_sample-f005a374b4d3468e2a4010e9645ec404', 0, 3)": [], "('random_sample-f005a374b4d3468e2a4010e9645ec404', 1, 0)": [], "('random_sample-f005a374b4d3468e2a4010e9645ec404', 1, 1)": [], "('random_sample-f005a374b4d3468e2a4010e9645ec404', 0, 2)": [], "('random_sample-f005a374b4d3468e2a4010e9645ec404', 3, 3)": [], "('random_sample-f005a374b4d3468e2a4010e9645ec404', 2, 1)": [], "('random_sample-f005a374b4d3468e2a4010e9645ec404', 1, 2)": [], "('random_sample-f005a374b4d3468e2a4010e9645ec404', 0, 1)": [], "('random_sample-f005a374b4d3468e2a4010e9645ec404', 2, 2)": []} state: ['processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing'] workers: []
NoneType: None
2022-08-04 20:01:34,331 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-f005a374b4d3468e2a4010e9645ec404', 3, 1)
NoneType: None
2022-08-04 20:01:34,333 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-f005a374b4d3468e2a4010e9645ec404', 2, 3)
NoneType: None
2022-08-04 20:01:34,334 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-f005a374b4d3468e2a4010e9645ec404', 2, 0)
NoneType: None
2022-08-04 20:01:34,334 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-f005a374b4d3468e2a4010e9645ec404', 3, 2)
NoneType: None
2022-08-04 20:01:34,335 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-f005a374b4d3468e2a4010e9645ec404', 1, 3)
NoneType: None
2022-08-04 20:01:34,335 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-f005a374b4d3468e2a4010e9645ec404', 0, 3)
NoneType: None
2022-08-04 20:01:34,336 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-f005a374b4d3468e2a4010e9645ec404', 1, 0)
NoneType: None
2022-08-04 20:01:34,337 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-f005a374b4d3468e2a4010e9645ec404', 1, 1)
NoneType: None
2022-08-04 20:01:34,339 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-f005a374b4d3468e2a4010e9645ec404', 0, 2)
NoneType: None
2022-08-04 20:01:34,339 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-f005a374b4d3468e2a4010e9645ec404', 3, 3)
NoneType: None
2022-08-04 20:01:34,340 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-f005a374b4d3468e2a4010e9645ec404', 2, 1)
NoneType: None
2022-08-04 20:01:34,340 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-f005a374b4d3468e2a4010e9645ec404', 1, 2)
NoneType: None
2022-08-04 20:01:34,340 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-f005a374b4d3468e2a4010e9645ec404', 0, 1)
NoneType: None
2022-08-04 20:01:34,341 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-f005a374b4d3468e2a4010e9645ec404', 2, 2)
NoneType: None
2022-08-04 20:01:36,284 - distributed.client - WARNING - Couldn't gather 14 keys, rescheduling {"('random_sample-f005a374b4d3468e2a4010e9645ec404', 3, 1)": (), "('random_sample-f005a374b4d3468e2a4010e9645ec404', 2, 3)": (), "('random_sample-f005a374b4d3468e2a4010e9645ec404', 2, 0)": (), "('random_sample-f005a374b4d3468e2a4010e9645ec404', 3, 2)": (), "('random_sample-f005a374b4d3468e2a4010e9645ec404', 1, 3)": (), "('random_sample-f005a374b4d3468e2a4010e9645ec404', 0, 3)": (), "('random_sample-f005a374b4d3468e2a4010e9645ec404', 1, 0)": (), "('random_sample-f005a374b4d3468e2a4010e9645ec404', 1, 1)": (), "('random_sample-f005a374b4d3468e2a4010e9645ec404', 0, 2)": (), "('random_sample-f005a374b4d3468e2a4010e9645ec404', 3, 3)": (), "('random_sample-f005a374b4d3468e2a4010e9645ec404', 2, 1)": (), "('random_sample-f005a374b4d3468e2a4010e9645ec404', 1, 2)": (), "('random_sample-f005a374b4d3468e2a4010e9645ec404', 0, 1)": (), "('random_sample-f005a374b4d3468e2a4010e9645ec404', 2, 2)": ()}
2022-08-04 20:01:36,289 - distributed.scheduler - ERROR - Couldn't gather keys {"('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 1, 2)": [], "('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 2, 2)": [], "('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 3, 0)": [], "('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 0, 0)": [], "('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 3, 2)": [], "('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 2, 3)": [], "('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 0, 3)": [], "('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 1, 3)": [], "('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 1, 0)": [], "('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 3, 1)": []} state: ['processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing'] workers: []
NoneType: None
2022-08-04 20:01:36,292 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 1, 2)
NoneType: None
2022-08-04 20:01:36,294 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 2, 2)
NoneType: None
2022-08-04 20:01:36,295 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 3, 0)
NoneType: None
2022-08-04 20:01:36,296 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 0, 0)
NoneType: None
2022-08-04 20:01:36,296 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 3, 2)
NoneType: None
2022-08-04 20:01:36,297 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 2, 3)
NoneType: None
2022-08-04 20:01:36,299 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 0, 3)
NoneType: None
2022-08-04 20:01:36,299 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 1, 3)
NoneType: None
2022-08-04 20:01:36,300 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 1, 0)
NoneType: None
2022-08-04 20:01:36,300 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 3, 1)
NoneType: None
2022-08-04 20:01:39,768 - distributed.client - WARNING - Couldn't gather 10 keys, rescheduling {"('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 1, 2)": (), "('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 2, 2)": (), "('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 3, 0)": (), "('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 0, 0)": (), "('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 3, 2)": (), "('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 2, 3)": (), "('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 0, 3)": (), "('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 1, 3)": (), "('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 1, 0)": (), "('random_sample-e27bb5c8a897ddad7edeafb5c7c6f173', 3, 1)": ()}
2022-08-04 20:01:39,770 - distributed.scheduler - ERROR - Couldn't gather keys {"('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 0, 0)": [], "('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 0, 1)": [], "('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 2, 2)": [], "('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 3, 1)": [], "('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 0, 3)": [], "('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 1, 3)": [], "('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 2, 1)": []} state: ['processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing'] workers: []
NoneType: None
2022-08-04 20:01:39,771 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 0, 0)
NoneType: None
2022-08-04 20:01:39,771 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 0, 1)
NoneType: None
2022-08-04 20:01:39,772 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 2, 2)
NoneType: None
2022-08-04 20:01:39,772 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 3, 1)
NoneType: None
2022-08-04 20:01:39,773 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 0, 3)
NoneType: None
2022-08-04 20:01:39,773 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 1, 3)
NoneType: None
2022-08-04 20:01:39,773 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 2, 1)
NoneType: None
2022-08-04 20:01:39,780 - distributed.client - WARNING - Couldn't gather 7 keys, rescheduling {"('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 0, 0)": (), "('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 0, 1)": (), "('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 2, 2)": (), "('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 3, 1)": (), "('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 0, 3)": (), "('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 1, 3)": (), "('random_sample-e0b08c5ce24f259f14b893349a2cd2ce', 2, 1)": ()}
2022-08-04 20:01:39,834 - distributed.scheduler - ERROR - Couldn't gather keys {"('random_sample-7a32eedf23a0fce4ff45284db801f879', 2, 0)": [], "('random_sample-7a32eedf23a0fce4ff45284db801f879', 0, 1)": [], "('random_sample-7a32eedf23a0fce4ff45284db801f879', 1, 2)": [], "('random_sample-7a32eedf23a0fce4ff45284db801f879', 3, 3)": [], "('random_sample-7a32eedf23a0fce4ff45284db801f879', 2, 1)": [], "('random_sample-7a32eedf23a0fce4ff45284db801f879', 1, 3)": [], "('random_sample-7a32eedf23a0fce4ff45284db801f879', 2, 3)": [], "('random_sample-7a32eedf23a0fce4ff45284db801f879', 1, 1)": [], "('random_sample-7a32eedf23a0fce4ff45284db801f879', 0, 2)": [], "('random_sample-7a32eedf23a0fce4ff45284db801f879', 1, 0)": [], "('random_sample-7a32eedf23a0fce4ff45284db801f879', 2, 2)": []} state: ['processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing', 'processing'] workers: []
NoneType: None
2022-08-04 20:01:39,835 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-7a32eedf23a0fce4ff45284db801f879', 2, 0)
NoneType: None
2022-08-04 20:01:39,835 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-7a32eedf23a0fce4ff45284db801f879', 0, 1)
NoneType: None
2022-08-04 20:01:39,836 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-7a32eedf23a0fce4ff45284db801f879', 1, 2)
NoneType: None
2022-08-04 20:01:39,836 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-7a32eedf23a0fce4ff45284db801f879', 3, 3)
NoneType: None
2022-08-04 20:01:39,837 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-7a32eedf23a0fce4ff45284db801f879', 2, 1)
NoneType: None
2022-08-04 20:01:39,837 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-7a32eedf23a0fce4ff45284db801f879', 1, 3)
NoneType: None
2022-08-04 20:01:39,838 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-7a32eedf23a0fce4ff45284db801f879', 2, 3)
NoneType: None
2022-08-04 20:01:39,838 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-7a32eedf23a0fce4ff45284db801f879', 1, 1)
NoneType: None
2022-08-04 20:01:39,839 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-7a32eedf23a0fce4ff45284db801f879', 0, 2)
NoneType: None
2022-08-04 20:01:39,840 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-7a32eedf23a0fce4ff45284db801f879', 1, 0)
NoneType: None
2022-08-04 20:01:39,843 - distributed.scheduler - ERROR - Workers don't have promised key: [], ('random_sample-7a32eedf23a0fce4ff45284db801f879', 2, 2)
NoneType: None
2022-08-04 20:01:39,846 - distributed.client - WARNING - Couldn't gather 11 keys, rescheduling {"('random_sample-7a32eedf23a0fce4ff45284db801f879', 2, 0)": (), "('random_sample-7a32eedf23a0fce4ff45284db801f879', 0, 1)": (), "('random_sample-7a32eedf23a0fce4ff45284db801f879', 1, 2)": (), "('random_sample-7a32eedf23a0fce4ff45284db801f879', 3, 3)": (), "('random_sample-7a32eedf23a0fce4ff45284db801f879', 2, 1)": (), "('random_sample-7a32eedf23a0fce4ff45284db801f879', 1, 3)": (), "('random_sample-7a32eedf23a0fce4ff45284db801f879', 2, 3)": (), "('random_sample-7a32eedf23a0fce4ff45284db801f879', 1, 1)": (), "('random_sample-7a32eedf23a0fce4ff45284db801f879', 0, 2)": (), "('random_sample-7a32eedf23a0fce4ff45284db801f879', 1, 0)": (), "('random_sample-7a32eedf23a0fce4ff45284db801f879', 2, 2)": ()}

Here is a minimal self-contained example that is closer to the ultimate thing I am trying to do (which is convert pieces of an xArray dataset to a dataframe and save to Parquet in parallel). Please let me know if I should make a new post or post to the xArray forum.

import numpy as np
import dask.array as da
import xarray as xr
from dask.distributed import Client
import dask

client = Client()
client

dim_sz = 100_000
slice_sz = 100
# The following list contains: [(0, 100), (100, 200), ..., (99_900, 100_000)]
# Each element is used to select a slice of the dataset along one of the dimensions.
slice_bounds = [
    (start_ind, min(start_ind + slice_sz, dim_sz))
    for start_ind in range(0, dim_sz, slice_sz)]

def convert_slice_to_dataframe(bounds):
    """Select a slice of an xArray dataset and convert to Pandas DF."""
    start, stop = bounds
    
    # Generate dataset inside callable to avoid any issues with transfering large objects to 
    # task.
    ds = xr.Dataset({
        'x': xr.DataArray(
            data   = da.random.random((dim_sz, dim_sz), chunks=(30000, 672)),
            dims   = ['dim1', 'dim2'],
            coords = {'dim1': np.arange(0, dim_sz), 'dim2': np.arange(0, dim_sz)})})
    
    _ds = ds.isel(dim2=slice(start, stop))
    df = _ds.to_dataframe()
    # print(df.memory_usage(deep=True).sum())
    # df is only ~130 MB
    # Here is where we would save df to Parquet on S3 if this wasn't a minimal example.
    del ds
    del _ds
    del df


# This works.
convert_slice_to_dataframe(slice_bounds[0])
    
# This fails. See output below.
results = [dask.delayed(convert_slice_to_dataframe)(bounds) for bounds in slice_bounds]
dask.compute(results)
Output

/srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/node.py:177: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43145 instead
warnings.warn(
2022-08-10 15:34:01,230 - distributed.scheduler - ERROR - Couldn’t gather keys {“(‘getitem-08ee878c78af7bf9fe55216fa3ef50d9’, 3, 0)”: } state: [‘waiting’] workers:
NoneType: None
2022-08-10 15:34:01,233 - distributed.scheduler - ERROR - Workers don’t have promised key: , (‘getitem-08ee878c78af7bf9fe55216fa3ef50d9’, 3, 0)
NoneType: None
2022-08-10 15:34:01,245 - distributed.nanny - WARNING - Restarting worker
2022-08-10 15:34:01,307 - distributed.client - WARNING - Couldn’t gather 1 keys, rescheduling {“(‘getitem-08ee878c78af7bf9fe55216fa3ef50d9’, 3, 0)”: ()}
2022-08-10 15:36:13,335 - distributed.worker_memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see Worker — Dask.distributed 2022.8.1+6.gc15a10e8 documentation for more information. – Unmanaged memory: 2.32 GiB – Worker memory limit: 3.82 GiB
2022-08-10 15:36:31,363 - distributed.worker_memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see Worker — Dask.distributed 2022.8.1+6.gc15a10e8 documentation for more information. – Unmanaged memory: 2.50 GiB – Worker memory limit: 3.82 GiB
2022-08-10 15:36:39,371 - distributed.nanny - WARNING - Restarting worker
2022-08-10 15:36:45,022 - distributed.nanny - WARNING - Restarting worker
2022-08-10 15:38:24,974 - distributed.scheduler - ERROR - Couldn’t gather keys {“(‘getitem-3dff7aac5137b8f4c50147e01d318c33’, 1, 0)”: [‘tcp://127.0.0.1:33517’], “(‘getitem-cb848162868593bf247812a38da12cf9’, 1, 0)”: [‘tcp://127.0.0.1:33517’], “(‘getitem-3dff7aac5137b8f4c50147e01d318c33’, 3, 0)”: [‘tcp://127.0.0.1:33517’], “(‘getitem-cb848162868593bf247812a38da12cf9’, 3, 0)”: [‘tcp://127.0.0.1:33517’], “(‘getitem-3dff7aac5137b8f4c50147e01d318c33’, 0, 0)”: [‘tcp://127.0.0.1:33517’], “(‘getitem-3dff7aac5137b8f4c50147e01d318c33’, 2, 0)”: [‘tcp://127.0.0.1:33517’]} state: [‘waiting’, ‘waiting’, ‘waiting’, ‘waiting’, ‘waiting’, ‘waiting’] workers: [‘tcp://127.0.0.1:33517’]
NoneType: None
2022-08-10 15:38:24,977 - distributed.scheduler - ERROR - Couldn’t gather keys {“(‘getitem-ae6043c102ccb822f10a9f2f7ce917c6’, 3, 0)”: [‘tcp://127.0.0.1:33517’], “(‘getitem-5130ac7e11126fb2612bda66ff499464’, 1, 0)”: [‘tcp://127.0.0.1:33517’], “(‘getitem-5130ac7e11126fb2612bda66ff499464’, 3, 0)”: [‘tcp://127.0.0.1:33517’], “(‘getitem-ae6043c102ccb822f10a9f2f7ce917c6’, 2, 0)”: [‘tcp://127.0.0.1:33517’]} state: [‘waiting’, ‘waiting’, ‘waiting’, ‘waiting’] workers: [‘tcp://127.0.0.1:33517’]
NoneType: None
2022-08-10 15:38:25,015 - distributed.scheduler - ERROR - Workers don’t have promised key: [‘tcp://127.0.0.1:33517’], (‘getitem-3dff7aac5137b8f4c50147e01d318c33’, 1, 0)
NoneType: None
2022-08-10 15:38:25,018 - distributed.scheduler - ERROR - Workers don’t have promised key: [‘tcp://127.0.0.1:33517’], (‘getitem-cb848162868593bf247812a38da12cf9’, 1, 0)
NoneType: None
2022-08-10 15:38:25,021 - distributed.scheduler - ERROR - Workers don’t have promised key: [‘tcp://127.0.0.1:33517’], (‘getitem-3dff7aac5137b8f4c50147e01d318c33’, 3, 0)
NoneType: None
2022-08-10 15:38:25,024 - distributed.scheduler - ERROR - Workers don’t have promised key: [‘tcp://127.0.0.1:33517’], (‘getitem-cb848162868593bf247812a38da12cf9’, 3, 0)
NoneType: None
2022-08-10 15:38:25,024 - distributed.scheduler - ERROR - Workers don’t have promised key: [‘tcp://127.0.0.1:33517’], (‘getitem-3dff7aac5137b8f4c50147e01d318c33’, 0, 0)
NoneType: None
2022-08-10 15:38:25,025 - distributed.scheduler - ERROR - Workers don’t have promised key: [‘tcp://127.0.0.1:33517’], (‘getitem-3dff7aac5137b8f4c50147e01d318c33’, 2, 0)
NoneType: None
2022-08-10 15:38:25,028 - distributed.scheduler - ERROR - Workers don’t have promised key: [‘tcp://127.0.0.1:33517’], (‘getitem-ae6043c102ccb822f10a9f2f7ce917c6’, 3, 0)
NoneType: None
2022-08-10 15:38:25,029 - distributed.scheduler - ERROR - Workers don’t have promised key: [‘tcp://127.0.0.1:33517’], (‘getitem-5130ac7e11126fb2612bda66ff499464’, 1, 0)
NoneType: None
2022-08-10 15:38:25,029 - distributed.scheduler - ERROR - Workers don’t have promised key: [‘tcp://127.0.0.1:33517’], (‘getitem-5130ac7e11126fb2612bda66ff499464’, 3, 0)
NoneType: None
2022-08-10 15:38:25,031 - distributed.scheduler - ERROR - Workers don’t have promised key: [‘tcp://127.0.0.1:33517’], (‘getitem-ae6043c102ccb822f10a9f2f7ce917c6’, 2, 0)
NoneType: None
2022-08-10 15:38:25,055 - distributed.client - WARNING - Couldn’t gather 6 keys, rescheduling {“(‘getitem-3dff7aac5137b8f4c50147e01d318c33’, 1, 0)”: (‘tcp://127.0.0.1:33517’,), “(‘getitem-cb848162868593bf247812a38da12cf9’, 1, 0)”: (‘tcp://127.0.0.1:33517’,), “(‘getitem-3dff7aac5137b8f4c50147e01d318c33’, 3, 0)”: (‘tcp://127.0.0.1:33517’,), “(‘getitem-cb848162868593bf247812a38da12cf9’, 3, 0)”: (‘tcp://127.0.0.1:33517’,), “(‘getitem-3dff7aac5137b8f4c50147e01d318c33’, 0, 0)”: (‘tcp://127.0.0.1:33517’,), “(‘getitem-3dff7aac5137b8f4c50147e01d318c33’, 2, 0)”: (‘tcp://127.0.0.1:33517’,)}
2022-08-10 15:38:25,055 - distributed.client - WARNING - Couldn’t gather 6 keys, rescheduling {“(‘getitem-3dff7aac5137b8f4c50147e01d318c33’, 1, 0)”: (‘tcp://127.0.0.1:33517’,), “(‘getitem-cb848162868593bf247812a38da12cf9’, 1, 0)”: (‘tcp://127.0.0.1:33517’,), “(‘getitem-3dff7aac5137b8f4c50147e01d318c33’, 3, 0)”: (‘tcp://127.0.0.1:33517’,), “(‘getitem-cb848162868593bf247812a38da12cf9’, 3, 0)”: (‘tcp://127.0.0.1:33517’,), “(‘getitem-3dff7aac5137b8f4c50147e01d318c33’, 0, 0)”: (‘tcp://127.0.0.1:33517’,), “(‘getitem-3dff7aac5137b8f4c50147e01d318c33’, 2, 0)”: (‘tcp://127.0.0.1:33517’,)}
2022-08-10 15:38:25,064 - distributed.client - WARNING - Couldn’t gather 4 keys, rescheduling {“(‘getitem-ae6043c102ccb822f10a9f2f7ce917c6’, 3, 0)”: (‘tcp://127.0.0.1:33517’,), “(‘getitem-5130ac7e11126fb2612bda66ff499464’, 1, 0)”: (‘tcp://127.0.0.1:33517’,), “(‘getitem-5130ac7e11126fb2612bda66ff499464’, 3, 0)”: (‘tcp://127.0.0.1:33517’,), “(‘getitem-ae6043c102ccb822f10a9f2f7ce917c6’, 2, 0)”: (‘tcp://127.0.0.1:33517’,)}
2022-08-10 15:38:25,065 - distributed.client - WARNING - Couldn’t gather 4 keys, rescheduling {“(‘getitem-ae6043c102ccb822f10a9f2f7ce917c6’, 3, 0)”: (‘tcp://127.0.0.1:33517’,), “(‘getitem-5130ac7e11126fb2612bda66ff499464’, 1, 0)”: (‘tcp://127.0.0.1:33517’,), “(‘getitem-5130ac7e11126fb2612bda66ff499464’, 3, 0)”: (‘tcp://127.0.0.1:33517’,), “(‘getitem-ae6043c102ccb822f10a9f2f7ce917c6’, 2, 0)”: (‘tcp://127.0.0.1:33517’,)}
2022-08-10 15:38:25,078 - distributed.nanny - WARNING - Restarting worker

KilledWorker Traceback (most recent call last)
Input In [2], in <cell line: 44>()
42 # This fails with KeyError: ‘data’
43 results = [dask.delayed(convert_slice_to_dataframe)(bounds) for bounds in slice_bounds]
—> 44 dask.compute(results)

File /srv/conda/envs/notebook/lib/python3.9/site-packages/dask/base.py:575, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
572 keys.append(x.dask_keys())
573 postcomputes.append(x.dask_postcompute())
→ 575 results = schedule(dsk, keys, **kwargs)
576 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File /srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/client.py:3004, in Client.get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
3002 should_rejoin = False
3003 try:
→ 3004 results = self.gather(packed, asynchronous=asynchronous, direct=direct)
3005 finally:
3006 for f in futures.values():

File /srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/client.py:2178, in Client.gather(self, futures, errors, direct, asynchronous)
2176 else:
2177 local_worker = None
→ 2178 return self.sync(
2179 self._gather,
2180 futures,
2181 errors=errors,
2182 direct=direct,
2183 local_worker=local_worker,
2184 asynchronous=asynchronous,
2185 )

File /srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/utils.py:318, in SyncMethodMixin.sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
316 return future
317 else:
→ 318 return sync(
319 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
320 )

File /srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/utils.py:385, in sync(loop, func, callback_timeout, *args, **kwargs)
383 if error:
384 typ, exc, tb = error
→ 385 raise exc.with_traceback(tb)
386 else:
387 return result

File /srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/utils.py:358, in sync..f()
356 future = asyncio.wait_for(future, callback_timeout)
357 future = asyncio.ensure_future(future)
→ 358 result = yield future
359 except Exception:
360 error = sys.exc_info()

File /srv/conda/envs/notebook/lib/python3.9/site-packages/tornado/gen.py:762, in Runner.run(self)
759 exc_info = None
761 try:
→ 762 value = future.result()
763 except Exception:
764 exc_info = sys.exc_info()

File /srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/client.py:2041, in Client._gather(self, futures, errors, direct, local_worker)
2039 exc = CancelledError(key)
2040 else:
→ 2041 raise exception.with_traceback(traceback)
2042 raise exc
2043 if errors == “skip”:

KilledWorker: (‘convert_slice_to_dataframe-deba1930-efb6-45a8-b304-4b9911a7e756’, <WorkerState ‘tcp://127.0.0.1:33517’, name: 1, status: closed, memory: 0, processing: 303>)
2022-08-10 15:38:29,478 - distributed.scheduler - ERROR - Couldn’t gather keys {“(‘getitem-9fb1b7559fb2e40e2fd1fb6bff34f52e’, 3, 0)”: , “(‘getitem-a949ece5c8d71b224b3cf5a3b2da3e53’, 3, 0)”: } state: [‘waiting’, ‘waiting’] workers:
NoneType: None
2022-08-10 15:38:29,479 - distributed.scheduler - ERROR - Workers don’t have promised key: , (‘getitem-9fb1b7559fb2e40e2fd1fb6bff34f52e’, 3, 0)
NoneType: None
2022-08-10 15:38:29,481 - distributed.scheduler - ERROR - Workers don’t have promised key: , (‘getitem-a949ece5c8d71b224b3cf5a3b2da3e53’, 3, 0)
NoneType: None
2022-08-10 15:38:29,487 - distributed.client - WARNING - Couldn’t gather 2 keys, rescheduling {“(‘getitem-9fb1b7559fb2e40e2fd1fb6bff34f52e’, 3, 0)”: (), “(‘getitem-a949ece5c8d71b224b3cf5a3b2da3e53’, 3, 0)”: ()}
2022-08-10 15:38:29,487 - distributed.client - WARNING - Couldn’t gather 2 keys, rescheduling {“(‘getitem-9fb1b7559fb2e40e2fd1fb6bff34f52e’, 3, 0)”: (), “(‘getitem-a949ece5c8d71b224b3cf5a3b2da3e53’, 3, 0)”: ()}