Computations happen before .compute() - expected behavior?

I have a multi-step process that manipulates dask arrays. I’m surprised by the fact that lots of time elapses and I see a lot of operations on the dashboard prior to the code hitting .compute. According to my log, when it it finally does hit compute, it executes instantly. Is this expected behavior?

I was under the impression that because of lazy evaluation, the code should get to the .compute stage really quickly.

If this is abnormal, then it seems like the reason would be operations that inadvertently trigger compute prematurely?

Some other pieces of information - my task graph is large (28 MB) - I get a warning about that when I start the client.

I’m pretty sure it’s because subsetting a numpy array with a dask vector triggers a computation. So no, typically compute should only happen when it’s called, unless there is some sneaky code like the below:

import dask.array as da
import numpy as np

x = np.linspace(0,1,10)
x = x[:,None]

ind = da.array([0, 3, 8, 2, 4])
ind = ind[None,:]

res = x[bool]

type(res)
<class 'numpy.ndarray'>

Nope!

Absolutely.

Yes, probably :).

This probably means that you triggers some computation and results are embedded in the graph.

Yes, that could totally be because of that!

1 Like