Task Stream Understanding "Assign" time

@winddude Thanks for the code!

I think you can use the “groups” plot to get the total time spent on each task:

import dask.dataframe as dd
from dask.distributed import Client

client = Client()

ddf = dd.DataFrame.from_dict(
    {"x": range(1_000_000), "y": range(1_000_000)}, npartitions=4
)

def func(x):
    return x

res = ddf.apply(func, axis=1).persist()

(Note that you would need to use persist() because this plot is cleared after compute())

Also, I see you have multiple apply statements, there’s currently no way to distinguish them, but as a work-around, you can rewrite your code to use low-level collections. I wouldn’t recommend re-writing unless it’s an absolute deal-breaker, because the current dashboard plots can still give you lot of useful information. :slight_smile:

1 Like