gRPC crashes dask worker

I recently added a library which uses grpc under the hood to my pipeline and workers started dying randomly. Presumably at compute() points but hard to tell.

I use this library by creating client in preload step at worker startup and attach it to worker instance, then use get_worker() to access it.

This is how all crashes look like:

F0000 00:00:1755796341.585627     236 forkable.cc:58] Check failed: !std::exchange(is_forking_, true)                                                                                                                                                                                    
*** Check failure stack trace: ***                                                                                                                                                                                                                                                       
Unspecified Application Error                                                                                                                                                                                                                                                            
2025-08-21 17:12:21,637 - distributed.nanny - INFO - Worker process 79 exited with status 1                                                                                                                                                                                              
2025-08-21 17:12:21,655 - distributed.nanny - WARNING - Restarting worker

I also get a lot of these messages although they seem harmless:

Other threads are currently calling into gRPC, skipping fork() handlers

I found a possible solution suggesting using multiprocessing.set_start_method("forkserver") but neither forkserver or spawn modes worked (dask might be overwriting this somewhere. i replaced https://github.com/dask/dask/blob/main/dask/\__main_\_.py and put set_start_methodunder if name == “main”:)

Has anyone seen something similar when using gRPC? What else is there to try to make it work in dask’s parallelized multiprocessing/threading environment?

Hi @Fogapod,

If you are using a distributed Scheduler, you can set the fork method using dask Configuration object. The correct value is distributed.worker.multiprocessing-method, see also Configuration — Dask documentation.

Did you try with a threaded Scheduler to see if you had problems?

I will try this, thanks

Although it seems to default to spawn which should work, at least in python signature in docs: 'multiprocessing-method': 'spawn'

This did not work. I made sure setting is read correctly by entering invalid values too

Hmm, I don’t have much other suggestions, do you have more detail error stacktrace? Could you come up with a reproducer?

no. this is all i have. it abruptly stops with this message

unfortunately no, at least not now. the workflow is quite complex and it fails randomly at different points

i dont think i can do that either because everything is configured for distributed deployment

I filed an issue for that library too, ill post back if i find something. Maybe i will try with grpcio too later

Which kind of Cluster MAnager do you use (e.g. LocalCluster or something else?). A thing to try would be to disable Nanny and see if it does something.

I deploy dask in kubernetes. I have custom image with copypasted main.pyentrypoint and custom preload files. Hope this answers the question