I’m running jobs through dask_jobqueue.SLURMCluster
that potentially run for many hours.
Is it possible to define a threshold value, if the worker is close to the walltime, to start a new one instead of using one that is going to be killed in the next n
minutes?
Hi @PythonF,
This need has been discussed in Soft time limit for workers? · Issue #416 · dask/dask-jobqueue · GitHub. There is also an issue open in distributed
: Enhancement Request - Dask Workers lifetime option not waiting for job to finish · Issue #3141 · dask/distributed · GitHub.
So the answer is: this is not possible at the moment. But this need has been expressed by several users, so it would be nice if someone is willing to contribute to distributed
code on this subject!