Passing dask objects to delayed computations without triggering compute

m-albert · January 8, 2023, 11:20pm

I want to pass a dask array to a delayed function without triggering compute, i.e. that the delayed function receives the dask array as is.

How do I best do this? And maybe more fundamentally, is this problematic or bad practice?

Consider the following:

x = da.ones(2)

# triggers compute:
delayed(lambda x: x)(x).compute()
# array([1., 1.])
delayed(lambda x: x)(delayed(x)).compute()
# array([1., 1.])

# a way I found that doesn't trigger compute
delayed(da.Array)(x.dask, x.name, x.chunks, x.dtype).compute()
# dask.array<ones_like, shape=(2,), dtype=float64, chunksize=(2,), chunktype=numpy.ndarray>

The aim would be to implement a computation that requires two consecutive computes, with the second one depending on the values obtained in the first one.

More specifically, the context is the implementation of a dask-image version of scipy.ndimage.map_coordinates. More details can be found here and here. Thanks a lot!

Genevieve · January 10, 2023, 5:04am

How do I best do this? And maybe more fundamentally, is this problematic or bad practice?

I don’t think there is currently an accepted way to do this.

Overall I like it, it seems like a clever workaround to the problem.

It solves a problem we’re having, which is good.
It looks like complicated syntax, which isn’t ideal (more difficult for reviewers/maintainers to understand) and is sometimes a sign that a simpler approach is needed - but in this case I don’t think a simpler and better option exists.
One “code smell” with Dask can be if the .visualize() task graph looks excessively complicated for a small example. So I had a look, but this doesn’t seem like it’s a problem. So I think it’s probably quite an elegant idea to sidestep the problem.
It might be worth trying this approach out on a medium-ish size dataset - that’s another way to find any issues

m-albert · January 20, 2023, 5:28pm

Interesting. Thanks for mentioning these points!

Yes, .visualize shows the dask array as a regular dependency. I tried passing some larger datasets and didn’t find an obvious performance problem.

Indeed the syntax does look a bit complicated. I guess in principle a function could perform this decomposition and delayed recomposition, so that one could pass sth like nocompute(x) (working slightly differently for each dask collection). At the same time, I’m wondering whether one could expect the second example above to do the trick: delayed(lambda x: x)(delayed(x)).compute().

Topic		Replies	Views
Dask array, twice delayed Dask Array dask-array , distributed	6	793	February 23, 2022
How to properly use Dask delayed on a function that calls other functions Deploying Dask delayed	11	401	August 13, 2023
Question: if I am mixing dask.delayed functions and using dask dataframes, are there any caveats to be aware of? Dask DataFrame delayed	5	730	August 21, 2023
Passing `@delayed` functions to other `@delayed` functions Distributed	5	264	January 23, 2024
Delayed dataframe computation Distributed dask-array , xarray , distributed	2	503	April 28, 2022

Passing dask objects to delayed computations without triggering compute

Related topics