Debug: is it possible to associate labels or any identifier to dask tasks?

Hello everybody again,

In my code I generate a huge task graph to parallelize a data stream. Unfortunately, this code has a lots of conditionals that are hard to debug which part is related to the generated task graph. So my question is really simple. Is it possible to create a label for some specific part of the task graph?

For example:

def my_mean(block):
    return block.mean()

if __name__ == '__main__':
    my_cond = True

    dask_array = dask.array.random.random(100000)

    mean = my_mean(dask_array)

    if my_cond:
        mean = my_mean(mean)

    mean.compute()

Notice that the task graph is dependent of my_cond variable and it is not associated to the dask_array values itself.

What a would like to do (or something similar) is:

def my_mean(block, label):
    with DaskTaskLabel(label):
        return block.mean()

if __name__ == '__main__':
    my_cond = True

    dask_array = dask.array.random.random(100000)

    mean = my_mean(dask_array, "mean1")

    if my_cond:
        mean = my_mean(mean, "mean2")

    mean.compute()

So, I could check exactly when that mean() function was called and who called. I can even filter it by specific labels. This would be an interesting thing for huge task streams like mine.

@jcfaracco Good question!

Dask’s low-level collections, Delayed and Futures, allow you to specify labels:

This isn’t implemented yet for high-level collections (like Dask Array in your example) though, here’s the open issue: Mechanism for naming tasks generated by high level collections · Issue #9047 · dask/dask · GitHub

A workaround could be to rewrite your code using low-level collections:

import numpy as np
from dask import delayed

@delayed
def my_mean(block):
    return block.mean()

my_cond = True

arr = np.random.random(100000)

mean = my_mean(arr, dask_key_name="mean1")

if my_cond:
    mean = my_mean(arr, dask_key_name="mean2")

mean # Delayed('mean2')

In this case, please be careful to not mix collections!