This PR from 2016 changed the behavior of the local threaded scheduler to instantiate a new ThreadPoolExecutor per thread ID.
Now, when a Dask graph is invoked from a function that is dispatched to a ThreadPoolExecutor, it may trigger the creation of up to N ThreadPoolExecutors where N is the worker count of the original.
A probably common idiomatic pattern when using Dask in asyncio code is to offload it to the event loops thread pool executor via asyncio.to_thread as to not block the event loop. However, this puts us into the situation described above.
When I run pstree on a long-lived server, I see hundreds of threads due to Dask keeping this global map of ThreadPoolExecutors.
As far as I can tell, the ThreadPoolExecutor implementation is now thread-safe (patch). It acquires a lock during both submit() and shutdown(). Is there a reason to still instantiate a different ThreadPoolExecutor per thread ID?
Having this multiplicative factor of running threads is a bit problematic in some environments like Kubernetes where cgroup throttling can occur due to thread oversubscription. It also changes the dynamic on the memory footprint of the application since the total number of active workers can be a multiple of what dask.config.set(num_workers)says it is.