Delayed functions with Dask - Worse performance

Hey,

I just started with Dask and have been conducting some experiments, and so far I failed to achieve any improvement in terms of speed with Dask.

I have a similar problem to the following code, and would like to use Dask for parallelizing the processes inside the for loop on my local machine with multiple threads/cores.

My code is as follows:

import dask
import time


def double(x):
    return x * 2


def add(x, y):
    return x + y


if __name__ == "__main__":
    data = [i for i in range(100000)]

    start_time = time.time()
    output_ = []
    for x in data:
        a = double(x)
        b = add(a, a)
        output_.append(b)
    total = sum(output_)
    print("Non Dask: --- %s seconds ---" % (time.time() - start_time))

    start_time = time.time()
    output = []
    for x in data:
        a = dask.delayed(double)(x)
        b = dask.delayed(add)(a, a)
        output.append(b)
    total = dask.delayed(sum)(output)
    print("Dask 1: --- %s seconds ---" % (time.time() - start_time))
    start_time = time.time()
    total.compute()
    print("Dask 2: --- %s seconds ---" % (time.time() - start_time))
    start_time = time.time()
    total.compute(scheduler="processes")
    print("Dask 3: --- %s seconds ---" % (time.time() - start_time))
    start_time = time.time()
    total.compute(scheduler="multiprocessing")
    print("Dask 4: --- %s seconds ---" % (time.time() - start_time))

Output:

Non Dask: --- 0.021062135696411133 seconds ---
Dask 1: --- 8.071345567703247 seconds ---
Dask 2: --- 12.640542030334473 seconds ---
Dask 3: --- 23.266422748565674 seconds ---
Dask 4: --- 22.93493390083313 seconds ---

Process finished with exit code 0

Am I misunderstanding or misusing Dask? Is it possible to achieve faster execution through parallelizing this process instead of executing it sequentially?

Thank you in advance!

Hi @hashishoya, welcome to Dask community!

Your example code is using really fast and immediate computations. Dask introduce some overhead, even in multithreading mode, see Efficiency — Dask.distributed 2023.7.0+8.g8e3e0f6e documentation.

So in this toy example, the overhead introduced by Dask is far greater than the potential benefits. If your real code is of the same kind, then it might be better to look at Numba or other Python tools that optimize execution time by compiling part of the code.