High unmanaged memory usage when using Dask to run Cellpose predictions

Hi, I am trying to use Dask to tile up an image and run Cellpose segmentation predictions on each chunk. The problem is that the unmanaged memory usage climbs to exceed the 250GB of ram allocated to each worker, far more than Cellpose should need.

This is the script I am trying to run. I have it set up to run 1 task per GPU at a time and have, been running it on 4 GPUs with a total of 1TB of ram. The original image is around 392x1900x1800 and I am using chunk sizes of 98x194x132 with an overlap of 64. I was able to run a previous version of my code on a much smaller image of around 392x900x900 without much issue.

The memory usage on the GPU remains reasonable at around 32GB but for some reason, the workers are constantly spiking memory usage and I am not sure how to fix it. I’ve also noticed that GPU utilization remains at zero for most of the time.

Here are some examples from the dashboard:

And here are my versions of Dask:
dask 2023.3.2 pypi_0 pypi
dask-cuda 23.4.0 pypi_0 pypi

This is the dashboard for the smaller image run. The image is of size 392x900x900 and the chunk size is set to 150x300x300 with 64 overlap.

Hi @Myrk , welcome to Dask community!

Considering that your input image weight about 10GiB, it’s a bit weird that you have some workers using 160GiB memory!!

That only 4 times smaller, that’s not a big difference, which makes all this even stranger.

Some questions I have:

  • Is there a reason why you are using smaller chunks with the bigger image? With map_overlap, this will complicate things, generating a lots of exchange between chunks to build the overlap.
  • Do you need an overlap in all dimensions?
  • There is a rechunking triggered into the first graph, it does not come from you?
  • Is the output of your map_overlap bigger than the input? Since you’re calling compute(), all the results will remain into memory until the end.

Hi and thank you for the quick response. Also apologies for any lack of knowledge this is my first time using Dask.

  • To be completely honest I had not tried with larger chunks, in my mind smaller chunks equals less memory usage but I realized how silly that was when I woke up today. Though, my images also have some features that make predictions on smaller chunks slightly more accurate, larger chunks will still probably work as well.
  • The overlap helps with stitching together the segmentations and came as a recommendation from this discourse.
  • I do not believe it came from me unless there’s anything in my code that might cause it to automatically re-chunk.
  • I am unfamiliar with how exactly Dask handles the overlap in memory, but the output from this script is the correct size and everything, except for the mismatched labels from the tiling.

Edit: I just ran it using chunks of size 200x800x1000 and it ran well all the way through. Though changing the chunks to 200x400x500 is making each worker exceed 200GB

That’s not completely silly :slight_smile:, but you don’t want them to be to small, because the benefits won’t be as large as the drawbacks (Scheduler burden, shorter tasks, etc.). map_overlap can generate a lot of burden, especially with small chunks!

Well, map_overlap can cause rechunk if needed, I’m not sure under which conditions.

It’s nice that it workers, but I think we didn’t find the real source of the problem. I still cannot see how just a map_overlap would use so much more memory than the input image size.