Minimizing wall time in Dask Delayed chunking

I have a general question on the best way to troubleshoot a situation where the total CPU time is fairly short, but the wall time is much longer. In the context of this code example below, the total CPU time was 1.5 hours, while the wall time was over 6 hours.

I wonder if this is caused by how I implemented the chunking with Dask Delayed, or in the delayed function itself. For added explanation the process_tile method is using Rasterio library to fetch image metadata from GeoTIFFs stored on Azure Blob.

%%time
chunk_size = 20

for aoi in AOIs:
    aoi_s1_tiles = dataset_tree['s1'][aoi]
    
    # create chunks of tiles
    for i in range(0, len(aoi_s1_tiles), chunk_size):
        future_pool = []
        tile_chunk = aoi_s1_tiles[i:i+chunk_size]
        
        # loop over each sentinel-1 chip in chunk
        for tile in tile_chunk:
            future = dask.delayed(process_tile)(tile, aoi)
            future_pool.append(future)
        future_pool = dask.persist(*future_pool)
        dask.compute(*future_pool)

CPU times: user 1h 19min 18s, sys: 21min 49s, total: 1h 41min 8s
Wall time: 6h 23min 33s

@KennSmith Thanks for the question, could you please share some sample/synthetic data to help us reproduce this?

Some general notes about your code:

  • Calling .compute() right after .persist() isn’t really helpful, you can call .compute() directly
  • It’s generally a good idea to minimize for-loops – persist and compute inside a loop are also expensive
  • It looks like each chuck is probably executed sequentially? – Because the delayed is getting computed within a loop, so the outer loops might be executing sequentially…

I’d also encourage you to look at the diagnostic dashboard, it might give some insight into timings.