Unexpected behaviour when saving dask.array after reshape

khyll · June 7, 2023, 2:07pm

I have a 4.5 GB tiff-file that is stored in a flattened 2D-format. I want to reshape into its proper 3D-array shape and then resave. Ideally, I’d like to then save one file for each z-slice in the 3D-array, for example with to_npy_stack.

import os
import dask.array as da
im_stack = da.random.randint(low=0, high=1300, size=(1, 8213400, 900))
im_stack_rs = im_stack.reshape([9126, 900, 900])
save_dir = os.getcwd()
da.to_npy_stack(save_dir, im_stack_rs, axis=0)

However I got the following error:

> OSError: -1197874592 requested and 0 written

I also tried to save it as a zarray:

save_path_zarr = os.path.join(save_dir, “test.zarr”)
im_stack_rs.to_zarr(save_path_zarr, compressor=Blosc(cname=‘zstd’, clevel=3, shuffle=Blosc.BITSHUFFLE))
ValueError: Number of elements in chunk must be < 4gb (number of elements in chunk must be < 4GB)

Finally, I tried to_hdh5:

save_path_hdf5 = os.path.join(save_dir, “test.hdf5”)
im_stack.to_hdf5(save_path_hdf5, ‘x’)
RuntimeError: Can’t decrement id ref count (unable to extend file properly)

Any suggestions on a reasonable way to save my large image stack?

guillaumeeb · June 10, 2023, 7:54am

Hi @khyll, welcome to Dask community!

Thanks for the reproducible example and the detailed post.

I just run your code, but in my environment it worked. I only get a warning when calling the line:

saying a large chunk was produced, but when looking at the resulting chunks shape, all was okay (chunks a little above 128MiB). Could you check on your side what is the chunk shape of your resulting Dask Array?

On my side it looks like this:

In any case, one we found what is the problem on your side, I would recommend Zarr as a chunked Array file format.

crusaderky · June 12, 2023, 9:18am

Hi @khyll ,
I don’t think your reproducer faithfully shows the problem - that’s why @guillaumeeb is failing to reproduce it.
You created your mock data with

im_stack = da.random.randint(low=0, high=1300, size=(1, 8213400, 900))

which, by default, will aim to create chunks worth ~128 MiB each.
However, your error message is very telling:

it’s saying that your data is in a single, gargantuan chunk.
How are you loading the TIFF file?
If you have no other option but to load it into a single monolithic chunk, you should call .rechunk(...) immediately after loading.

khyll · June 15, 2023, 7:37am

Thanks for both of your replies!

I’m using dask_image.imread to load my file. Looking into it, it seems indeed like it loads it as one massive chunk. I’ve tried to rechunk it to one chunk per image in the 3D-stack:

im_stack_rs = im_stack_rs.rechunk(chunks=(1, 900, 900))

And then used to_npy_array in the same way as before, with axis=0. Now it seems to do what I wanted. Sadly, each .npy file does not seem to correspond to an image slice, which is what I’m really after, but at least I understand a bit more.

guillaumeeb · June 15, 2023, 10:17pm

Could you elaborate a bit on this part? What are you getting?

khyll · July 26, 2023, 11:21am

My hope was that I would be able to save each slice/chunk of my 3D dask array into an individual 2D-ifile. But zarrays simply don’t seem to work like that. I might have to wait until dask-image implements a tiff or bigtiff imsave.

guillaumeeb · July 26, 2023, 8:59pm

I just did the test with a smaller Array, and after the rechunking, I got one file per chunk, so each npy file is of shape (1, 900, 900), isn’t it what you wanted?

Topic		Replies	Views
Dask image array to jpg Dask Array dask-array	0	391	November 12, 2022
How to Parallel Saving Many Large Dask Arrays Distributed dask-array , delayed , future , distributed	4	359	January 17, 2023
Reshape creating big chunks Dask Array	0	210	October 1, 2022
Most efficient way to copy from Dask array to Numpy Dask Array dask-array	2	51	December 4, 2024
Change array shape with map_block function Dask Array	1	137	November 16, 2023

Unexpected behaviour when saving dask.array after reshape

Related topics