Hello everybody again,
In my code I generate a huge task graph to parallelize a data stream. Unfortunately, this code has a lots of conditionals that are hard to debug which part is related to the generated task graph. So my question is really simple. Is it possible to create a label for some specific part of the task graph?
For example:
def my_mean(block):
return block.mean()
if __name__ == '__main__':
my_cond = True
dask_array = dask.array.random.random(100000)
mean = my_mean(dask_array)
if my_cond:
mean = my_mean(mean)
mean.compute()
Notice that the task graph is dependent of my_cond
variable and it is not associated to the dask_array
values itself.
What a would like to do (or something similar) is:
def my_mean(block, label):
with DaskTaskLabel(label):
return block.mean()
if __name__ == '__main__':
my_cond = True
dask_array = dask.array.random.random(100000)
mean = my_mean(dask_array, "mean1")
if my_cond:
mean = my_mean(mean, "mean2")
mean.compute()
So, I could check exactly when that mean()
function was called and who called. I can even filter it by specific labels. This would be an interesting thing for huge task streams like mine.