Delayed functions with Dask - Worse performance

hashishoya · July 12, 2023, 1:55pm

Hey,

I just started with Dask and have been conducting some experiments, and so far I failed to achieve any improvement in terms of speed with Dask.

I have a similar problem to the following code, and would like to use Dask for parallelizing the processes inside the for loop on my local machine with multiple threads/cores.

My code is as follows:

import dask
import time


def double(x):
    return x * 2


def add(x, y):
    return x + y


if __name__ == "__main__":
    data = [i for i in range(100000)]

    start_time = time.time()
    output_ = []
    for x in data:
        a = double(x)
        b = add(a, a)
        output_.append(b)
    total = sum(output_)
    print("Non Dask: --- %s seconds ---" % (time.time() - start_time))

    start_time = time.time()
    output = []
    for x in data:
        a = dask.delayed(double)(x)
        b = dask.delayed(add)(a, a)
        output.append(b)
    total = dask.delayed(sum)(output)
    print("Dask 1: --- %s seconds ---" % (time.time() - start_time))
    start_time = time.time()
    total.compute()
    print("Dask 2: --- %s seconds ---" % (time.time() - start_time))
    start_time = time.time()
    total.compute(scheduler="processes")
    print("Dask 3: --- %s seconds ---" % (time.time() - start_time))
    start_time = time.time()
    total.compute(scheduler="multiprocessing")
    print("Dask 4: --- %s seconds ---" % (time.time() - start_time))

Output:

Non Dask: --- 0.021062135696411133 seconds ---
Dask 1: --- 8.071345567703247 seconds ---
Dask 2: --- 12.640542030334473 seconds ---
Dask 3: --- 23.266422748565674 seconds ---
Dask 4: --- 22.93493390083313 seconds ---

Process finished with exit code 0

Am I misunderstanding or misusing Dask? Is it possible to achieve faster execution through parallelizing this process instead of executing it sequentially?

Thank you in advance!

guillaumeeb · July 13, 2023, 8:42am

Hi @hashishoya, welcome to Dask community!

Your example code is using really fast and immediate computations. Dask introduce some overhead, even in multithreading mode, see Efficiency — Dask.distributed 2023.7.0+8.g8e3e0f6e documentation.

So in this toy example, the overhead introduced by Dask is far greater than the potential benefits. If your real code is of the same kind, then it might be better to look at Numba or other Python tools that optimize execution time by compiling part of the code.

Topic		Replies	Views
One output time vs multiple output time Deploying Dask delayed	1	269	April 19, 2022
Dask delayed isn't more quicker that dask.array Dask Array delayed	2	38	July 13, 2024
Need help with efficient parallelization [local machine] Distributed delayed , distributed	2	256	July 30, 2022
Seemingly more overhead than expected for scheduling tasks Distributed distributed	2	217	January 31, 2023
Why is it so slow? Dask Array	1	378	October 31, 2022

Delayed functions with Dask - Worse performance

Related topics