Measuring the overall profile of long runs

I am using Dask distributed to run large numbers (thousands) of dynamically generated tasks. There are different types of tasks which each correspond to a function in my code. The total runtime of the calculations is in the order of hours to days and uses multiple nodes using Dask-jobqueue.

What I want to do is get a breakdown of the total time spent in each function. The profile page on the dashboard does exactly what I want, but it seems to be limited to only one hour of data, which is way too small. The HTML files created with client.profile don’t contain more data. Is there a simple way to record a global profile?

1 Like

Hi @RaphaelRobidas, welcome back!

Did you try all the possibilities described here:


If even the performance report doesn’t contain more than one hour of data, then this is a problem. Did you also tried to use the start kwarg of Client.profile?

Hello @guillaumeeb,

Thanks for the reply.

I am also saving the task streams, but they are not so convenient for what I’m trying to measure. They don’t seem to contain all the tasks since the start of the job. I am writing the diagnostics every “cycle” in my code, so it could be possible that task streams get flushed every time and contain all the tasks when combined. In any case, I would need to add up all the task times to get the total runtime, and it is not quite as detailed as the profile.

I added the start argument to Client.profile. The exact type is not specified, the documentation just says start: time. I couldn’t figure out the exact type, so I tried a POSIX time. The time in the profile files don’t add up to the reported compute time, there is a lot of time missing. The total time in the profiles can either increase or decrease between each cycle, so it clearly doesn’t contain a global profile.

The performance_report function also seems to write files with a very limited scope. The job duration in the report is 20 minutes, while the job ran for around 21 hours.

My code uses multiple clients in different threads. The actual jobs run completely fine this way, but could it be problematic for the diagnostics?

That might explain some problems. Do you try to profile every clients, or to use the performance_report context manager for every threads?

Maybe we should try with a simpler use case, only one Client, and see if it works for more than one hour of data?

Every cycle, each Client saves its profile as a unique file. The performance_report is used as context in the main thread when it launches all the other threads and waits for them to finish. I will try the simpler case, like you suggest. In theory, should these diagnostics be saved for each thread or for only one thread?

I would say that it has to be saved for every Client, but I’m really not sure about it.

Maybe we should try with a simpler use case, only one Client, and see if it works for more than one hour of data?

I tried it. The task stream in the performance report is fine and spans all the runtime. However, the worker profile is missing a lot of compute time, and so are the Client.profile files when added together. It seems that this issue is more general that we thought.