Local dask.distributed | no shared memory with thread-based client?

Summary: I submit a list to be modified to a thread-based client → the concrete list is not modified. Why?

For process-based clients, I understand that concrete objects are serialized, sent to workers, and reconstructed on the worker side. The reconstructed objects are entirely new objects.

For thread-based clients, I thought that concrete objects were shared with workers without serialization, and therefore the concrete objects and the worker objects were the same objects.

The snippet below is proving me wrong for thread-based clients with local distributed (single machine): appending to a remote list does not append to the concrete list.

Could someone explain why or point to documentation I missed?

For comparison, appending to the list passed to another thread with a ThreadPoolExecutor does append to the concrete list, see snippet below.

(I also thought that these shared memory considerations only concerned process-based clients)

from concurrent.futures import ThreadPoolExecutor

from distributed import Client


def append(_list: list, value: str) -> None:
    _list.append(value)


if __name__ == "__main__":
    my_list = []

    with ThreadPoolExecutor() as thread_executor:
        thread_executor.submit(append, _list=my_list, value="thread_executor").result()

    with Client(processes=False) as client:
        client.submit(append, _list=my_list, value="thread_client").result()

    print(f"{my_list=}")

Output:

my_list=['thread_executor']

I was expecting
my_list=['thread_executor', 'thread_client']

Hi @templiert,

You are using a Distributed Cluster. Even if you use only Threaded Worker, you’ll end up with your main Python process, the Scheduler process, and one Worker process. You create your list in the main Python process, where the Client object lives, but when you submit your function and object, they are serialized to the Worker process.

Only a real Local Dask Scheduler would work, but you don’t have the Client interface with it, since it’s only available through Distributed cluster.

1 Like