This is not a good example to demonstrate the segment fault triggered, which shows no useful log at all, but it still shows the side effect on each worker that executes tasks.
The following code demonstrates that the side effect, which is setting some internal state of the worker to raise exception instead of showing warnings. From my point of view, since there is no code dependencies in different delayed tasks, they should not share any state.
In my segment fault case, I did the same thing, but the segment fault arises right after the restart. I think it is possible that the side effect that persist across worker restarts that causes the memory access violation. I attached sharable log at the end of the post. I cannot share the exact code but I am able to avoid triggering the segment fault by removing the warnings code.
PoC code
import time
import random
import dask
from dask.distributed import LocalCluster, Client
def capture_exceptions(func):
def wrapper(*args, **kwargs):
try:
return func(*args, **kwargs)
except Exception as e:
print(e)
return None
return wrapper
@capture_exceptions
def get_delayed_side_effect(x: int) -> int:
import warnings
warnings.warn("Side effect is being applied")
warnings.filterwarnings("error")
time.sleep(random.random())
print(x)
return x
def main():
client = Client("tcp://localhost:8786")
for outer in range(3):
l_delayed = []
for i in range(10):
l_delayed.append(dask.delayed(get_delayed_side_effect)(i))
futures = client.compute(l_delayed)
for future in futures:
print(future.result())
client.restart()
if __name__ == "__main__":
main()
Actual worker output at segment fault
2025-11-18 22:43:37,336 - distributed.nanny - INFO - Worker process 1257699 was killed by signal 11
2025-11-18 22:43:37,386 - distributed.nanny - WARNING - Restarting worker
2025-11-18 22:43:37,569 - distributed.nanny - INFO - Worker process 1257707 was killed by signal 11
2025-11-18 22:43:37,586 - distributed.nanny - WARNING - Restarting worker
2025-11-18 22:43:37,792 - distributed.nanny - INFO - Worker process 1257724 was killed by signal 11
2025-11-18 22:43:37,809 - distributed.nanny - WARNING - Restarting worker
2025-11-18 22:43:37,992 - distributed.diskutils - INFO - Found stale lock file and directory '/tmp/dask-scratch-space/worker-91grg2rk', purging
2025-11-18 22:43:37,994 - distributed.nanny - INFO - Worker process 1257680 was killed by signal 11
2025-11-18 22:43:38,006 - distributed.nanny - WARNING - Restarting worker
2025-11-18 22:43:38,194 - distributed.nanny - INFO - Worker process 1257738 was killed by signal 11
Client side segment fault error
distributed.scheduler.KilledWorker: Attempted to run task 'a-function-name-2f968de0d0af6189cdef9f8af9309313' on 4 different workers, but all those workers died while running it. The last worker that attempt to run the task was tcp://192.168.1.1:39875. Inspecting worker logs is often a good next step to diagnose what went wrong. For more information see https://distributed.dask.org/en/stable/killed.html.
(