da.zeros_like
instanciated arrays are not writeable when map_blocks
is run over them
Summary
Dask arrays created with da.zeros_like
are not writeable when used alongside with map_blocks
.
Trying to write in-place the received block array inside of the mapped function will produce the following error:
ValueError('assignment destination is read-only')
So, I raise this topic to understand of to overcome the problem, or if this is a misuse of Dask from my side, to find out alternatives.
The main problem I aim to solve is: how to properly write in-place received chunks inside of a function mapped over a Dask array with map_blocks
?
Or, in other words: can we modify a chunk in-place?
In this notebook, I show how to reproduce the error, following the same MCVE guidelines as on the xarray GitHub. For reference:
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Thank you for your help!
Initialization
import warnings
warnings.simplefilter("ignore")
from typing import Any
import dask
import dask.array as da
import numpy as np
import numpy.typing as npt
from dask.distributed import Client
dask.__version__
'2023.11.0'
client = Client(n_workers=4, threads_per_worker=4, memory_limit="16GiB")
print(client)
<Client: 'tcp://127.0.0.1:46231' processes=4 threads=16, memory=64.00 GiB>
Test preparation
shape = (4, 4)
chunks = (2, 2)
dtype = np.int32
Case A: np.zeros
The first array is created from a numpy array, with np.zeros
.
dask_array_a = da.from_array(np.zeros(np.prod(shape), dtype=dtype).reshape(shape), chunks=chunks)
print(dask_array_a)
print(dask_array_a.compute())
dask.array<array, shape=(4, 4), dtype=int32, chunksize=(2, 2), chunktype=numpy.ndarray>
[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]]
Case B: da.zeros_like
The second array is created with the dask equivalent of zeros_like
, using the first array as a template.
dask_array_b = da.zeros_like(dask_array_a)
print(dask_array_b)
print(dask_array_b.compute())
dask.array<zeros_like, shape=(4, 4), dtype=int32, chunksize=(2, 2), chunktype=numpy.ndarray>
[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]]
Function to be mapped
Create a dummy function, to be mapped over the chunks of the dask arrays.
It updates the top-left pixel of the given chunk to 777.
Note that this does not make fail the “pre-flight” execution of the method
Note that map_blocks will attempt to automatically determine the output array type by calling func on 0-d versions of the inputs.
Source: dask.array.map_blocks
def block_write_in_place(
block: npt.NDArray[Any],
):
block[0, 0] = 777
return block
Test execution
Case A
lazy = dask_array_a.map_blocks(block_write_in_place)
result = lazy.compute()
print(result)
[[777 0 777 0]
[ 0 0 0 0]
[777 0 777 0]
[ 0 0 0 0]]
The in-place write is done as expected, without any error.
Case B
lazy = dask_array_b.map_blocks(block_write_in_place)
try:
result = lazy.compute()
except ValueError as error:
print(error)
assignment destination is read-only
2024-04-22 11:03:18,768 - distributed.worker - WARNING - Compute Failed
Key: ('block_write_in_place-d50ef239ad34e97a5f2c6ce6add02a7a', 0, 1)
Function: subgraph_callable-259a4044-a3c0-449e-86bc-2bc986a4
args: ((2, 2))
kwargs: {}
Exception: "ValueError('assignment destination is read-only')"
2024-04-22 11:03:18,768 - distributed.worker - WARNING - Compute Failed
Key: ('block_write_in_place-d50ef239ad34e97a5f2c6ce6add02a7a', 0, 0)
Function: subgraph_callable-259a4044-a3c0-449e-86bc-2bc986a4
args: ((2, 2))
kwargs: {}
Exception: "ValueError('assignment destination is read-only')"
2024-04-22 11:03:18,769 - distributed.worker - WARNING - Compute Failed
Key: ('block_write_in_place-d50ef239ad34e97a5f2c6ce6add02a7a', 1, 0)
Function: subgraph_callable-259a4044-a3c0-449e-86bc-2bc986a4
args: ((2, 2))
kwargs: {}
Exception: "ValueError('assignment destination is read-only')"
2024-04-22 11:03:18,769 - distributed.worker - WARNING - Compute Failed
Key: ('block_write_in_place-d50ef239ad34e97a5f2c6ce6add02a7a', 1, 1)
Function: subgraph_callable-259a4044-a3c0-449e-86bc-2bc986a4
args: ((2, 2))
kwargs: {}
Exception: "ValueError('assignment destination is read-only')"
Here, we can see the error: the received chunk is not writeable.