I have a Jupyter notebook where I want to show that I am using Dask to compute a demanding function.
However, it seems that my data is kept in the cache. In fact, after the first time I run the cell, the successive times I don’t see any process running in the dashboard.
Here is my code:
import dask
....
# Load a computational expensive function used in Plot API (GDAL based)
def extract_data_at_polygon(ncfile, polygon):
# set CRS to cut over shapefile
ds = ncfile.rio.write_crs("EPSG:4326")
# create the geometry
coords = polygon.split("((")[-1].split("))")[0].split(",")
coords_array = []
for cc in coords:
point = cc.split(" ")
point_to_float = [float(i) for i in point]
coords_array.append(point_to_float)
# Extract the envelope coordinates
envelope_coords = [
[min(p[0] for p in coords_array), min(p[1] for p in coords_array)],
[max(p[0] for p in coords_array), min(p[1] for p in coords_array)],
[max(p[0] for p in coords_array), max(p[1] for p in coords_array)],
[min(p[0] for p in coords_array), max(p[1] for p in coords_array)],
]
envelope = {
'type': 'Polygon',
'coordinates': [envelope_coords]
}
# Clip using the envelope
data = ds.rio.clip([envelope], ds.rio.crs, all_touched=True, from_disk=False)
return data
client = Client(dashboard_address=8088)
client.dashboard_link
ds = xr.open_dataset(era5_file, engine='netcdf4', chunks={'lat': 'auto', 'lon': 'auto', 'time': 'auto', 'level': 1})
era5_polygon = extract_data_at_polygon(ds, polygon)
Even though I don’t perform the computing, I get results regularly.
To be able to see the execution in the dashboard, I have to add era5_polygon.compute() but then the execution takes 29 seconds instead of 0.04 seconds without using era5_polygon.compute()
Any explanation? Am I doing something wrong?
Thanks in advance