How to get associated netcdf names from dask object

I want to save a job to write out a netcdf file at a later time in the program.
As in:
job=xarray.to_netcdfi(filename,compute=False)
job.compute()

Is there a way to find out the filename used in the to_netcdf method from the “job” object?
Thanks

Hi @axelschweiger and welcome to discourse!

That’s an interesting question-- there is some information you can get from the delayed object, but none of these methods or attributes include the filename. I think the closest would be to get the key for the task graph, here’s a small example:

import xarray as xr
import numpy as np
import pandas as pd

data = np.random.rand(4, 3)
locs = ["IA", "IL", "IN"]
times = pd.date_range("2000-01-01", periods=4)
foo = xr.DataArray(data, coords=[times, locs], dims=["time", "space"])

delayed_obj = foo.to_netcdf("example.nc", compute=False)
print(delayed_obj.key)

You can explicitly set the key name with delayed (see the delayed API docs, under dask_key_name, but I don’t think this is an option from to_netcdf.

Would you mind sharing a bit more on what you’re trying to do? Perhaps there is another solution we can come up with!

1 Like

Thanks. I had looked at the obj keys but didn’t see any obvious ways to see the file name. The reason I was trying to do this was that I wanted to collect the delayed jobs and process them differently depending on what the output file name is which determines different treatments (in this case.depending on the type of output generated I need to allocated some other resources). I can handle this differently but I thought this would be something straightforward to obtain, but I guess it is not.

Best
Axel

Thanks for providing more details. I see what you mean regarding being able to easily access the filename from the delayed object. I think this is hard to access from the high-level to_netcdf function because the name is tokenized, which ensures the keys are unique.

With graph_manipulation.bind, you can create a specific dependency. Here is an example from another discourse question. In your case you could do something like:

import xarray as xr
import numpy as np
import pandas as pd
from dask.graph_manipulation import bind
import dask


@dask.delayed
def special_func():
    pass


# create fake xarray dataset
data = np.random.rand(4, 3)
locs = ["IA", "IL", "IN"]
times = pd.date_range("2000-01-01", periods=4)
foo = xr.DataArray(data, coords=[times, locs], dims=["time", "space"])

# create dependency for some list of files
list_of_files = ['example.nc', 'example1.nc']
delayeds = [
    foo.to_netcdf(filename, compute=False) for filename in list_of_files
]
new_func = bind(special_func(), delayeds)
1 Like