Everytime I generate performance report of my dask computations, irrespective of the cluster configuration (number of workers, memory per worker) and the hardware (local machine or HPC nodes), in the final report generated by dask’s performance report, I can see that under the “System” tab, the memory usage is almost always close to 1 GB (like in the screenshot).
The question is, does this tab only show the memory usage by Scheduler process?
Hi @maneesh29s,
Yes! it is briefly explained in this video from the Dask documentation.
In addition, this is a good sign if Scheduler memory does not vary in your workload
. It is a bit weird though, I just tried a simple example:
import dask.array as da
from dask.distributed import Client, LocalCluster
from dask.distributed import performance_report
client = Client(n_workers=2, memory_limit='4GiB')
size = (10000, 10000)
chunks = (5000,5000)
a = da.random.random(size, chunks=chunks)
b = da.random.random(size, chunks=chunks)
with performance_report(filename="dr.html"), get_task_stream(client, plot='save', filename="ts.html") as ts:
s = (a + b).compute()
which gave me:
s = (a + b).compute()
Did your scheduler memory increase because of .compute() ?
Assuming that client and scheduler are part of the same process, .compute() would return the entire result back to client.
Hm, you are right on this point!