OSError('Timed out during handshake)

Hello there, sorry if this topic might already exist but I didn’t have much luck trying to find a solution for this, so hopefully someone can shed some light. We have a prefect + dask workflow setup and process user data in a parallel way, all the architecture is ran in AWS Batch, recently we started to face the following error:

Unexpected error occured in FlowRunner: OSError(‘Timed out during handshake while connecting to tcp://{{ip_address_goes_here}} after 30 s’)

This happens after a while of processing causing all jobs to fail after a while, we have proper error handling but this specific error never falls into our exception blocks so it makes it harder to debug, it seems to happen randomly at any step. What causes this? How can we get a better logging when this occurs? Any help or deep explanation is appreciated. Thanks!

Hi @FabioXN7, welcome! I think this question might be better answered by the Prefect community, but feel free to add a bit more context around your setup and we can see if anyone here can chime in!

1 Like