Hello,
Profiling tools like Dask’s “Performance Report”, “Profiler”, and fine performance metrics fail to display certain operations in the task section. This makes it challenging to identify bottlenecks in my workflow.
I try below to mimic my real use case scenario which roughly looks like:
create dask graphs for different variables => using persist on the largest computations =>
continue to add operations to the variables => write to disk (or call compute)
- Step to reproduce: first steps
from dask.distributed import performance_report, LocalCluster, Client
import dask.array as da
from dask.diagnostics import Profiler
cluster = LocalCluster(
n_workers=1,
threads_per_worker=2,
processes=False,
memory_limit="20gb", # can be changed
)
client = Client(cluster)
print(client.dashboard_link)
array = da.random.random((10, 10), chunks=(5, 5))
# simulate a deep graph
for _ in range(20):
array = da.abs(da.sin(array+100))
# visualize: the graph is fine
array.visualize("intermediate.png")
# Another array is created
array2 = array[::2]**3
# Intermediate persist
array = array.persist()
Expected Behavior: I expected to see information about the time required to compute da.abs
, da.sin
and +
operation.
Obtained behavior: fine performance metrics don’t show any sign of da.sin
or +
operator. Similarly, these operations are absent from the “task” section.
- Step to reproduce: continue to process data
# Other computations
for _ in range(20):
array = da.mod(da.log(array/20), 10)
# visualize: the graph is fine
array.visualize("final.png")
array = array.compute()
Expected behavior: I expect to see time information abot da.log
and da.mod
and /
operator.
Fine performance metrics again only show the mod (“remainder”) operation, but don’t show any timing for da.log
and /
operator.
- Final compute
array2 = array2.compute()
Expected behavior: I expected to see pow and slice in the metrics
Obtained behavior: the “pow” operation is here, but the “slice” operation is absent.
What I tried:
- Dask profiler
with Profiler() as prof:
array = array.compute()
prof.visualize()
yields an empty result.
- Dask performance report
The profile sections of the performance report are empty with this code used above.
Code to reproduce the issue with the performance report
from dask.distributed import performance_report, LocalCluster, Client
import dask.array as da
cluster = LocalCluster(
n_workers=1, # Number of workers (processes)
threads_per_worker=2, # Threads per worker
processes=False,
memory_limit="20gb",
) # Enable processes (default is True)
# Link the cluster to a client
client = Client(cluster)
print(client.dashboard_link)
with performance_report(filename="forum_discussion.html"):
array = da.random.random((10, 10), chunks=(5, 5))
# simulate a deep graph
for _ in range(20):
array = da.abs(da.sin(array+100))
# visualize: the graph is fine
array.visualize("intermediate.png")
# Another array is created
array2 = array[::2]**3
# Intermediate persist
array = array.persist()
# Other computations
for _ in range(20):
array = da.mod(da.log(array/20), 10)
# visualize: the graph is fine
array.visualize("final.png")
array = array.compute()
array2 = array2.compute()
What could be tried:
- Classic profiler
Not feasible due to multi-threading/multi-processing?
Questions
- Are these tasks absent because of task fusion/optimization?
- Are there alternative tools for profiling Dask computations?
- Is it the right way to profile code with intermediate .persist / .compute?
- Is this a known limitation or a potential bug?
In my real workflow (processing satellite data), the NA tasks shown in the Fine Performance Metrics grows very large and the profiler sections are not empty but extremely reduced (way smaller than what is actually computed). This make it hard to identify slow sections in the code and optimize it.
Thanks for any potential help (and for this nice forum)!