Hii everyone
I have a large lightsheet imaging data in the form of large number of .tiff files. In the initial stages of analysis, I want to take maximum projection of the data along one axis. For this, I am loading the images as a dask array using dask.delayed, then perform median subtraction and after which I want to take maximum projection.
I am currently using our institute’s HPC which is managed through Slurm. I first login to a compute node and then setup dask cluster as follows :
from dask_jobqueue import SLURMCluster
from dask.distributed import Client
cluster = SLURMCluster(queue='quick',cores=16 ,memory='32GB',walltime='01:05:00')
cluster.scale(jobs=10)
client = Client(cluster)
This is how one of the dask array corresponding to a channel looks:
rfp
dask.array<stack, shape=(4209, 13088, 7415), dtype=uint16, chunksize=(1, 13088, 7415), chunktype=numpy.ndarray>
Currently, it’s taking about 10 minutes for taking the max projection along axis=0. Albeit, it also displays the following warning :
UserWarning: Sending large graph of size 22.27 MiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
I want to understand if chunking the data in a specific way would help increase the speed of this process? When I rechunk the data along axis=0 to have chunk size 500 and then perform the max projection, it gives following error :
> 2024-06-26 17:28:49,619 - distributed.scheduler - ERROR - Task ('rechunk-merge-chunk_max-9854bcb041246ed32ffe88d6efb54f51', 7, 0, 0) has 90.38 GiB worth of input dependencies, but worker tcp://10.56.1.79:44823 has memory_limit set to 14.90 GiB
At a conceptual level, I am not able to understand how dask distributes this computation across requested nodes and how the computation is done on each node individually. Intuitively, I would imagine rechunking the data along axis=2 and axis=3 into smaller pieces and then take maximum projection on each chunks. And then if these computations are run parallely somehow. Sorry for terrible lack of understanding of how things work.