Dask slower than numpy

ahmed · August 19, 2022, 10:42am

I am a new dask user and I’m trying to run the function dot inside my program. I noticed that the function dot of dask is slower than its numpy version even when I use only one chunk in the whole matrix. How this behavious can be explained?

import dask.array as da 
import numpy as np
x = da.random.normal(10, 0.1, size=(20000 * 100000), chunks=(20000 * 100000))
z = x.dot(x)
%time z.compute()
'''
CPU times: user 1min 1s, sys: 17.3 s, total: 1min 18s
Wall time: 52 s
'''
y = x.compute()

%time w =y.dot(y)
'''
CPU times: user 19 s, sys: 8.24 s, total: 27.2 s
Wall time: 767 ms
'''

guillaumeeb · August 23, 2022, 4:54pm

Hi @ahmed,

You’re not comparing the same thing between Dask and Numpy.

Dask is lazy, so the first two lines:

x = da.random.normal(10, 0.1, size=(20000 * 100000), chunks=(20000 * 100000))
z = x.dot(x)

Does nothing. When timing z.compute(), you’re timing the generation of the random array plus the dot operation.

In Numpy, you’re only timing the dot operation (the array has already be created into memory), so this is fast.

See what I get (with a smaller input):

Dask:

x = da.random.normal(10, 0.1, size=(20000 * 20000), chunks=(20000 * 20000))
z = x.dot(x)
%time z.compute()

CPU times: user 10 s, sys: 646 ms, total: 10.7 s
Wall time: 9.97 s

40003968048.35344

Numpy:

%%time
a = np.random.normal(10, 0.1, size=(20000 * 20000))
b = a.dot(a)

CPU times: user 10.6 s, sys: 355 ms, total: 10.9 s
Wall time: 10.5 s

And with chunking and Dask:

x = da.random.normal(10, 0.1, size=(20000 * 20000), chunks=(10000 * 10000))
z = x.dot(x)
%time z.compute()

CPU times: user 10.7 s, sys: 1.13 s, total: 11.8 s
Wall time: 2.98 s

40004001781.91481

Topic		Replies	Views
Dask runs much slower than numpy in some case Dask Array dask-array	3	324	November 20, 2022
Difference in loading performance between dask array and numpy/joblib Dask Array zarr , numpy	6	367	June 21, 2023
How does da.histogram() work? Dask Array numpy	5	298	June 10, 2023
Why is it so slow? Dask Array	1	378	October 31, 2022
Dask delayed isn't more quicker that dask.array Dask Array delayed	2	38	July 13, 2024

Dask slower than numpy

Related topics