Hey,
I just started with Dask and have been conducting some experiments, and so far I failed to achieve any improvement in terms of speed with Dask.
I have a similar problem to the following code, and would like to use Dask for parallelizing the processes inside the for loop on my local machine with multiple threads/cores.
My code is as follows:
import dask
import time
def double(x):
return x * 2
def add(x, y):
return x + y
if __name__ == "__main__":
data = [i for i in range(100000)]
start_time = time.time()
output_ = []
for x in data:
a = double(x)
b = add(a, a)
output_.append(b)
total = sum(output_)
print("Non Dask: --- %s seconds ---" % (time.time() - start_time))
start_time = time.time()
output = []
for x in data:
a = dask.delayed(double)(x)
b = dask.delayed(add)(a, a)
output.append(b)
total = dask.delayed(sum)(output)
print("Dask 1: --- %s seconds ---" % (time.time() - start_time))
start_time = time.time()
total.compute()
print("Dask 2: --- %s seconds ---" % (time.time() - start_time))
start_time = time.time()
total.compute(scheduler="processes")
print("Dask 3: --- %s seconds ---" % (time.time() - start_time))
start_time = time.time()
total.compute(scheduler="multiprocessing")
print("Dask 4: --- %s seconds ---" % (time.time() - start_time))
Output:
Non Dask: --- 0.021062135696411133 seconds ---
Dask 1: --- 8.071345567703247 seconds ---
Dask 2: --- 12.640542030334473 seconds ---
Dask 3: --- 23.266422748565674 seconds ---
Dask 4: --- 22.93493390083313 seconds ---
Process finished with exit code 0
Am I misunderstanding or misusing Dask? Is it possible to achieve faster execution through parallelizing this process instead of executing it sequentially?
Thank you in advance!