I am looking for a way to track and store memory usage per task, including peak, min, and max memory usage per task. Similar to how one can use performance_metric to store various metrics to an html summary file, I’d also like to keep track of the memory usage of my tasks.
I’ve come across the package dask-memusage, but it only works when you have a cluster with workers running on a single-thread.
Are there any tools you guys recommend or any internal APIs I could use to achieve this?
It is very hard to get memory usage per task if your Worker process has several threads, because on system level you get the memory usage of a process. I guess having the memory usage per task would mean computing the memory footprint of the Python structures used in this task, which sounds hard.
So I’m not sure this is feasible in a multi threaded context.
Thank you for your response! Will look into those docs right now.
What do you think about this approach ? → I wrote a quick SchedulerPlugin that periodically pulls data from scheduler.workers.values() and stores metrics like memory_percent in a file. My goal is to monitor the total worker memory usage for each of my experiments, which will later help me fine-tune performance. I used this code for inspiration:
install(cluster.scheduler, "experiment_memory.csv")
def test_function(i, size_in_gb =2):
elements = (size_in_gb * (1024 ** 3)) // np.dtype(np.float64).itemsize
large_array = np.random.random(elements )
time.sleep(10)
return i*2
f = [client.submit(test_function, i, size_in_gb=2, pure=False) for i in range(8)]
r = client.gather(f)
Creates a file like this:
Curious, is there a more efficient way to pull data from WorkerTable and store somewhere?
Hi @guillaumeeb , thank you! Are there any examples pulling metrics from prometheus endpoint? And maybe in another topic ( I can ask in another post ) , can I publish custom metrics to prometheus endpoint ? Id always wanted to publish my own custom metrics to later visualize . I asked a long time to do something like this but got nowhere
Promtheus endpoints are just an HTTP endpoint with a specific format. Of course the most common way to collect that data is with Prometheus itself, but you could just peridically curl the endpoint and store the data however you like.
Make sure you have the prometheus_client package installed and then access <dask scheduler IP>:8787/metrics.
You could add more metrics via a scheduler plugin. We don’t have any documentation on this so you will need to look at how the current metrics are implemented.