High Availability and Resource Tracking

Hi,

I’m using a dask cluster as part of a web service backend. We have some tasks that we want to be highly available, and some others that are long running and should not be allowed to occupy all workers (preventing the other tasks from being highly available).

It looks like the way to do this is a combination of task priority and worker resources.

For example, if here are 4 workers, we can give 2 workers a HEAVY_LOAD=1 resource, preventing those very long running tasks from being assigned to the other two workers.

I think this should work OK (open to other suggestions though).

What I’m currently trying to solve is, I want the resource assignment of the HEADY_LOAD to be dynamic as workers join and leave the cluster. So I believe I need to be able to modify a worker’s resources at run time. My thought was a Scheduler plugin that will re-assign resources to all workers as a high availability worker, or a heavy load worker, whenever a new worker joins or leaves the cluster. This would prevent there ever being no HEAVY_LOAD workers if by chance a node goes offline, for example.

I just want to sanity check this idea, and make sure this could work, or there’s not a better approach.

Thanks.

@freebie I see this is your first question, welcome to Discourse!

I just want to sanity check this idea, and make sure this could work, or there’s not a better approach.

I think your idea is great and should work!