Hello,
Thanks for your answer!
This is quite a limitation to updating existing Python code with Numpy to make it use Dask, if the existing code performs of a lot of in-place mutations on its arrays. For example, a single array that go through lots of successive updates in a processing chain.
A workaround would be to copy and instantiate a new chunk inside of a map_blocks function, but it would mean repeatedly instantiating new chunks for each map_blocks usage and does not seems to be the best solution? However this is the only way to keep immutable inputs while reuse complex Numpy code, using dask more as a “wrapper” of Numpy code applied on chunks (allowing code-reuse).
Maybe having some documentation exposing common pitfalls when trying to reuse existing Python code using Numpy to make it work with Dask would be helpful! Maybe in this documentation page related to Dask-supported assignments on arrays (I found it useful as it exposes a list of concrete examples): Assignment — Dask documentation
From my current experiences, trying to run existing code “as is” with Dask arrays instead of Numpy ones works fine in 90% of the cases, but this is the remaining 10% that can be hard to work around, eg when trying to index a 2-D dask array with another 2-D dask array, and more complex assignments (assigning not a scalar but a full 2-D array). In some of the cases, the Python code itself can be rewritten and simplified to make it dask-friendly, but not always (I don’t have concrete examples right now unfortunately ; I will try to come up with better real life examples in the future)
Finally, it seems that the cases where the Dask arrays can be mutated (like when being built from a Numpy array) should not be relied on. Should this behaviour (described in this topic) be considered as a bug, as Dask arrays should be immutable?