Worker resources are a great way to coordinate many different execution plans (beyond just access to local resources). Being able to do something similar at a global (i.e. cluster) level would be great too, e.g. access to shared external resources, like a db. I know that dask provides a semaphore that would allow for this, but being able to control this through a global resources interface would provide many benefits over a semaphore.
Benefits:
- task will not start executing unless it can acquire resource.
- with semaphore the task needs to start processing first. This means that a worker will be occupied with a task blocked by a semaphore. I’m sure there are ways around this to allow the worker to continue doing work, but this would involve more complexity. Additionally, when monitoring your run, you will see the task as processing, when it actually is blocked.
- easier configuration/coordination of global resources.
- these would be able to be defined at execution time, after the graph is already built. tasks don’t need to be aware of any coordination primitives.
Hi @dask-user,
Could you elaborate a bit with some code example of what you’d want to achieve? I’m not sure I understand what you mean by:
Hi, sure would be happy to elaborate (sorry not really a code sample, but hopefully this will clarify the point). Let’s consider the task graph from the “custom graph” page of the dask docs: Custom Graphs — Dask documentation
In this graph we have a set of N tasks that potentially read data from and write data to a file system (nfs). However, let’s say the nfs can only handle M concurrent connections, where M < N. There’s a couple different ways we can handle this currently.
-
semaphore - this requires that the task be parameterized on the semaphore. It also requires the task to begin execution before we can determine if it can access the resource or not
-
worker resources - we can start M workers with a worker resource such as NFS_ACCESS=1. Any tasks that require nfs access will be submitted with NFS_ACCESS=1. This way you will be limited to M nfs accessing tasks at a time. This is a decent option, but is suboptimal as it limits you to a predetermined subset of workers on your cluster. Let’s say all of those workers are occupied with other tasks, then during that time no nfs accessing tasks can run even though they are technically able to.
Global resources would give you the best of both worlds. Tasks don’t need to be parameterized on a coordination primitive (thus they can be defined just based on the logic they need to execute). They won’t start running until they can for sure access the resource they need. They can run on any worker in the cluster. And can be easily configured at execution time in interface that is similar to an existing one, and familiar to many.
See also:
Note that there aren’t plans to implement such a feature in the short term.
1 Like
Ah thanks, I had only searched this forum for related topics. GitHub issues slipped my mind. Will follow up on the conversation there.