Cleanup after job runs

I have a client that uses the upload_file method to throw a bunch of custom code onto the distributed cluster. But after the job runs, especially if it fails, it causes the workers for any jobs afterward to idle and hang without responding after some time.

Is there a list of recommended best practices as a cluster provider for keeping the cluster maintained? Currently i’m using a kubernetes HPA to remove unused pods, but it appears that the files are kept and reloaded from the scheduler

Hi @myarbrou-rh,

AFAIK, there is no way to delete files uploaded to worker with upload_file plugin.
However, you might be able to unregister the plugin to prevent new Workers from uploading the files. Anyway, I see no reason why uploading files would cause worker failures afterwards?

We would need more detail here to understand the problem.