I have a general question on the best way to troubleshoot a situation where the total CPU time is fairly short, but the wall time is much longer. In the context of this code example below, the total CPU time was 1.5 hours, while the wall time was over 6 hours.
I wonder if this is caused by how I implemented the chunking with Dask Delayed, or in the delayed function itself. For added explanation the
process_tile method is using Rasterio library to fetch image metadata from GeoTIFFs stored on Azure Blob.
%%time chunk_size = 20 for aoi in AOIs: aoi_s1_tiles = dataset_tree['s1'][aoi] # create chunks of tiles for i in range(0, len(aoi_s1_tiles), chunk_size): future_pool =  tile_chunk = aoi_s1_tiles[i:i+chunk_size] # loop over each sentinel-1 chip in chunk for tile in tile_chunk: future = dask.delayed(process_tile)(tile, aoi) future_pool.append(future) future_pool = dask.persist(*future_pool) dask.compute(*future_pool) CPU times: user 1h 19min 18s, sys: 21min 49s, total: 1h 41min 8s Wall time: 6h 23min 33s