Understanding the deserialization time in dask distributed

ikabadzhov · November 19, 2021, 8:17am

Hi, I am trying to benchmark the performance of an HPC Cluster using Dask’s SLURMCluster. Thanks to the help I previously received from you, I managed to make it scale well on my cluster.

There is one issue that remained unsolved. I realize that if I initiliaze a scheduler and I ask it to perform multiple times the same task for me, then the first run is going to be several seconds slower.

I am attaching the results from the 3 runs on the same cluster:
Run 1 → Dask Performance Report
Run 2 → Dask Performance Report
Run 3 → Dask Performance Report

I read in the report that the first run took several seconds in “deserialize time”, and I wonder from where this extra time is added. If I read the Task Stream, I can confirm that only in the first run, there is a “deserialize-dask_mapper”.

What I have tried was to import ROOT as an external module, by adding relative paths as:

client.upload_file('../../root/root_rdfenv-ucx-2/lib/DistRDF/Backends/Dask/Backend.py')
client.upload_file('../../root/root_rdfenv-ucx-2/lib/ROOT/__init__.py')

I again produced 3 reports for each run:
Run 1 → Dask Performance Report
Run 2 → Dask Performance Report
Run 3 → Dask Performance Report

For runs 2,3 reports are as before. But I see that in report 1, there is no time labeled as “deserialize time”, nor there is a “deserialize-dask_mapper” in the task stream. And again the first run took few seconds longer than the other runs.

Question 1: I would like to ask you what might be the cause of the slower first runs.
Question 2: What is the transfer time in the summary of the report? I see that it is minimal for the first run, which seems unreasonable to me.

mrocklin · November 30, 2021, 2:55pm

Deseriailzation time is the time spent to recreate the function that you want to run on a remote machine. A common cause of initial long deserialization time followed by short (or non-existent) deserialization time is that the library you’re using takes a while to import the first time it is used. Subsequent imports are fast because Python keeps libraries in memory.

Import times can be particularly challenging on HPC systems when code is mounted on a network file system (NFS) because many workers all try to import the same code at once and the NFS thrashes a bit. Typically people just live with this. However, if you want to avoid it you could do a few things:

Install libraries onto local disk (if it exists)
Import the libraries when you start up the Dask workers (things will still be slow, but you’ll move around the slowness if that’s helpful). One way to do this would be to use the --preload my_library flag when starting a dask-worker

Hopefully that helps?

Topic		Replies	Views
Is this a fair benchmarking approach? Dask Array	4	250	June 8, 2022
Measuring the overall profile of long runs Distributed	17	433	May 22, 2024
Dask distributed performance issues Distributed kubernetes , future , distributed	1	246	December 7, 2022
One output time vs multiple output time Deploying Dask delayed	1	269	April 19, 2022
Advice on how to structure Dask computation Distributed	7	52	January 16, 2025

Understanding the deserialization time in dask distributed

Related topics