Computations happen before .compute() - expected behavior?

velosipednikov · August 2, 2023, 3:12pm

I have a multi-step process that manipulates dask arrays. I’m surprised by the fact that lots of time elapses and I see a lot of operations on the dashboard prior to the code hitting .compute. According to my log, when it it finally does hit compute, it executes instantly. Is this expected behavior?

I was under the impression that because of lazy evaluation, the code should get to the .compute stage really quickly.

If this is abnormal, then it seems like the reason would be operations that inadvertently trigger compute prematurely?

Some other pieces of information - my task graph is large (28 MB) - I get a warning about that when I start the client.

velosipednikov · August 2, 2023, 7:53pm

I’m pretty sure it’s because subsetting a numpy array with a dask vector triggers a computation. So no, typically compute should only happen when it’s called, unless there is some sneaky code like the below:

import dask.array as da
import numpy as np

x = np.linspace(0,1,10)
x = x[:,None]

ind = da.array([0, 3, 8, 2, 4])
ind = ind[None,:]

res = x[bool]

type(res)
<class 'numpy.ndarray'>

guillaumeeb · August 3, 2023, 9:12am

Nope!

Absolutely.

Yes, probably :).

This probably means that you triggers some computation and results are embedded in the graph.

Yes, that could totally be because of that!

Topic		Replies	Views
Dask execution performed only the first time Distributed xarray , client , dashboard	1	74	February 2, 2024
Passing dask objects to delayed computations without triggering compute Dask Array dask-array , delayed	2	394	January 20, 2023
Dask runs much slower than numpy in some case Dask Array dask-array	3	322	November 20, 2022
Prevent dask array from `compute()` behavior Dask Array dask-array	9	895	March 19, 2022
Latency between graph constitution and start of calculation Distributed performance	4	342	January 18, 2023

Computations happen before .compute() - expected behavior?

Related topics