Q1: yes - ish
You can control task names by customizing the task keys using the key
parameter when submitting a task with client.submit()
. If you want to group tasks in a meaningful way that reflects your workflow, you can use hierarchical task names like "ds=en::chunk=123"
. dask doesn’t automatically group tasks by such keys, but this naming pattern will help in visually distinguishing the tasks on the dask dashboard.
client.submit(audio_processing, chunk, key=f"ds={lang}::chunk={chunk_id}")
Unfortunately, as far as i know dask doesn’t have built-in task grouping for the dashboard, but with this approach, you can make the task keys hierarchical and easier to track.
If you’re seeing too many task bars in the dashboard, consider using a less granular naming scheme or grouping the tasks at a higher level (e.g., “ds=en::audio_processing”).
Q2 - yes
you can name the worker using worker API though this will require some work.
Worker API
from dask.distributed import Client, LocalCluster
cluster = LocalCluster(n_workers=5, worker_kwargs={'name': 'worker_name_prefix'})
client = Client(cluster)
Each worker will be named with the specified prefix. You could further extend this to give workers more specific names.
If you want more control over worker names (for example, naming them based on the dataset), you’d need to set up your workers manually using the Worker API:
from dask.distributed import Worker
worker = Worker('scheduler_address', name='ds_en_worker')
Q3: also yes - BUT and this is a big BUT this will require from your side to either create the keys and manage them and pass to the worker or if you already have have them, - you still need to manage them and pass to the workers.
again using worker API above. you would need the data parameter
You can define worker resources and then submit tasks using the resources argument, ensuring they only run on workers that match those resource tags.
To track which tasks are running on specific workers, you can still query the scheduler for assigned tasks:
info = client.scheduler_info()
for worker, data in info["workers"].items():
print(worker, data["tasks"]) # List of tasks on this worker
some thing like this i’d say