Hi,
I am working with the dask-image library and specifically the ndmeasure.label, for a geospatial analysis package I’m developing. My raster has shape 14040 x 25200 with chunksize 5120 x 5120. I am using a local distributed scheduler. The issue I have is that when I call compute/persist on the output of ‘label’, I get a warning “UserWarning: Sending large graph of size 337.43 MiB”. I checked out the amount of tasks in the task graph, but there are only 218 tasks.
The best practice page mentions to avoid large task graphs but seems to cover only the case where this is caused by too many tasks.
What could be the issue here? Is it likely to be related to my code, or inherent to the label function?
Hi @martijnvandermarel, welcome to Dask community!
This message comes usually when you crete object on the client side, and then serialize them through the tasks graph. Could you give us a reproducer, or at least some code snippet of what you are doing? How do you read your input data?
Thanks for the reply, @guillaumeeb . I believe I found the mistake I made. Instead of passing xarray.DataArray.data to the label function, I passed xarray.DataArray.values, which implicitly computes the array. I imagine the computed array is quite large when serialized, causing the warning.
Beginner mistake. In any case, I hope future dask users making the same mistake are helped by this.
1 Like
Can I ask how large your array was? I am getting some OOM error when using >~5k^3… (see Memory error using dask-image ndmeasure.label) so would be nice to have a comparison…