Good question.
In my previous post the variable background was assumed (perhaps wrongly) to be a numpy array. But if it is instead an image file, then it is important that the background image is aligned block-wise with each of your image files.
This mainly means a few things:
-
backgroundneeds to be adaskarray too. - Consequently
backgroundmust no longer be defined as a keyword parameter to the functionsubtract(), but rather as a positional parameter:def subtract(image, background): ... return result - This also applies to the
map_blocks()call:images = images.map_blocks(subtract, background) - In order for
map_blocks()to work,backgroundmust first have, along each axis, the same number of chunks as each of your images. Ifbackgroundhas the same shape as any image, you may use.rechunk()to ensure that the number of chunks match:
In this way, the chunks ofbackground = background.rechunk(chunks=images.chunks[1:])backgroundare “broadcastable” to the chunks of all your images so that corresponding chunk-regions inbackgroundand each image can be subtracted.
I suppose that, like your other images, you may store your background images in any format (Tiff, Zarr, Hdf5) that suits you best, as long as the above conditions are met during processing. Unfortunately, I do not have enough experience with these formats myself to give you any meaningful recommendation for any particular one of them.
import tifffile
import dask.array as da
def subtract(image, background):
...
return result
def batch_subtract_tiff2hdf5(
hdf5file, tifffiles, background, dataset_name='2p', chunksize=600, compression=None
):
"""Subtract background from TIFF images and write them to chunked dataset in HDF5 file."""
# Perhaps the following line works for the tiff format (I do not know)
# otherwise you can use the method you were using before:
images = da.image.imread(tifffiles)
print(images)
# make sure each background image is chunked the same way as each source image:
background = background.rechunk(chunks=images.chunks[1:])
# The following line takes advatage of parallelism in a compute cluster to do the subtraction:
# and so it should be fast
images = images.map_blocks(subtract, background)
# write processed images to disk:
images.to_hdf5(
hdf5file,
dataset_name,
shape=images.shape,
dtype=images.dtype,
chunks=(chunksize, *images.shape[1:]),
compression=compression,
)
# dask provides several options to laod an array from disk
# see https://docs.dask.org/en/stable/array-api.html#create-and-store-arrays
background = da.image.imread('I:/Scratch/averages/background1.tif')
print(background)
with tifffile.Timer():
batch_subtract_tiff2hdf5(
'test.hdf5', 'I:/Scratch/nih3t3-egfp_2/Pos0/*.tif', background, compression='lzf'
)