Unexpected behaviour when saving dask.array after reshape

I have a 4.5 GB tiff-file that is stored in a flattened 2D-format. I want to reshape into its proper 3D-array shape and then resave. Ideally, I’d like to then save one file for each z-slice in the 3D-array, for example with to_npy_stack.

import os
import dask.array as da
im_stack = da.random.randint(low=0, high=1300, size=(1, 8213400, 900))
im_stack_rs = im_stack.reshape([9126, 900, 900])
save_dir = os.getcwd()
da.to_npy_stack(save_dir, im_stack_rs, axis=0)

However I got the following error:

> OSError: -1197874592 requested and 0 written

I also tried to save it as a zarray:

save_path_zarr = os.path.join(save_dir, “test.zarr”)
im_stack_rs.to_zarr(save_path_zarr, compressor=Blosc(cname=‘zstd’, clevel=3, shuffle=Blosc.BITSHUFFLE))
ValueError: Number of elements in chunk must be < 4gb (number of elements in chunk must be < 4GB)

Finally, I tried to_hdh5:

save_path_hdf5 = os.path.join(save_dir, “test.hdf5”)
im_stack.to_hdf5(save_path_hdf5, ‘x’)
RuntimeError: Can’t decrement id ref count (unable to extend file properly)

Any suggestions on a reasonable way to save my large image stack?

Hi @khyll, welcome to Dask community!

Thanks for the reproducible example and the detailed post.

I just run your code, but in my environment it worked. I only get a warning when calling the line:

saying a large chunk was produced, but when looking at the resulting chunks shape, all was okay (chunks a little above 128MiB). Could you check on your side what is the chunk shape of your resulting Dask Array?

On my side it looks like this:
image

In any case, one we found what is the problem on your side, I would recommend Zarr as a chunked Array file format.

Hi @khyll ,
I don’t think your reproducer faithfully shows the problem - that’s why @guillaumeeb is failing to reproduce it.
You created your mock data with

im_stack = da.random.randint(low=0, high=1300, size=(1, 8213400, 900))

which, by default, will aim to create chunks worth ~128 MiB each.
However, your error message is very telling:

it’s saying that your data is in a single, gargantuan chunk.
How are you loading the TIFF file?
If you have no other option but to load it into a single monolithic chunk, you should call .rechunk(...) immediately after loading.

Thanks for both of your replies!

I’m using dask_image.imread to load my file. Looking into it, it seems indeed like it loads it as one massive chunk. I’ve tried to rechunk it to one chunk per image in the 3D-stack:

im_stack_rs = im_stack_rs.rechunk(chunks=(1, 900, 900))

And then used to_npy_array in the same way as before, with axis=0. Now it seems to do what I wanted. Sadly, each .npy file does not seem to correspond to an image slice, which is what I’m really after, but at least I understand a bit more.

Could you elaborate a bit on this part? What are you getting?

My hope was that I would be able to save each slice/chunk of my 3D dask array into an individual 2D-ifile. But zarrays simply don’t seem to work like that. I might have to wait until dask-image implements a tiff or bigtiff imsave.

I just did the test with a smaller Array, and after the rechunking, I got one file per chunk, so each npy file is of shape (1, 900, 900), isn’t it what you wanted?