Provide a custom task name for DataFrame tasks

In my workflow I’m making heavy use of DataFrame.apply, in multiple different contexts. The issue is that when I look at the dashboard, it’s not clear which of these is actually running, because the tasks are all called “apply”.

e.g. on the Workers page:

Is there any way to tell Dask that I want each apply call to have a custom name, for identifiability?

I was looking for something similar using the futures API. In client.submit it is possible to specify a key for each future but I have not seen it is used in the dashboard (at least not in the nice task charts :-))

1 Like

The trouble is I’m not sure how to manipulate the futures when using the DataFrame API because it kind of abstracts all that away.

I am not even sure that it is possible to manually set a key for operation on data frame (couldn’t find a way either) - but sort of piggy-backing on the thread :slight_smile:

Is there any way to tell Dask that I want each apply call to have a custom name, for identifiability?

@multimeric I don’t think you can customize this in the high-level collections like DataFrame and Array. (Only the lower-level Delayed and Futures APIs allow this, as @tomercagan suggested.)

Okay, I might submit this as a feature request on GitHub in that case.

1 Like
1 Like

@multimeric Thanks for opening the issue!