I recently added a library which uses grpc under the hood to my pipeline and workers started dying randomly. Presumably at compute() points but hard to tell.
I use this library by creating client in preload step at worker startup and attach it to worker instance, then use get_worker() to access it.
This is how all crashes look like:
F0000 00:00:1755796341.585627 236 forkable.cc:58] Check failed: !std::exchange(is_forking_, true)
*** Check failure stack trace: ***
Unspecified Application Error
2025-08-21 17:12:21,637 - distributed.nanny - INFO - Worker process 79 exited with status 1
2025-08-21 17:12:21,655 - distributed.nanny - WARNING - Restarting worker
I also get a lot of these messages although they seem harmless:
Other threads are currently calling into gRPC, skipping fork() handlers
I found a possible solution suggesting using multiprocessing.set_start_method("forkserver") but neither forkserver or spawn modes worked (dask might be overwriting this somewhere. i replaced https://github.com/dask/dask/blob/main/dask/\__main_\_.py and put set_start_methodunder if name == “main”:)
Has anyone seen something similar when using gRPC? What else is there to try to make it work in dask’s parallelized multiprocessing/threading environment?