@winddude Thanks for the code!
I think you can use the “groups” plot to get the total time spent on each task:
import dask.dataframe as dd
from dask.distributed import Client
client = Client()
ddf = dd.DataFrame.from_dict(
{"x": range(1_000_000), "y": range(1_000_000)}, npartitions=4
)
def func(x):
return x
res = ddf.apply(func, axis=1).persist()
(Note that you would need to use persist()
because this plot is cleared after compute()
)
Also, I see you have multiple apply
statements, there’s currently no way to distinguish them, but as a work-around, you can rewrite your code to use low-level collections. I wouldn’t recommend re-writing unless it’s an absolute deal-breaker, because the current dashboard plots can still give you lot of useful information.