-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a DerivedCatalog object to deal with derived variables #357
Comments
I took a stab at this. My current approach is similar to Matt's in that I'm keeping track of derived variable's info in a registry attached to the Initially this derivedcat registry is empty In [1]: import intake, intake_esm
In [2]: cat = intake.open_esm_datastore("./tests/sample-collections/catalog-dict-records.json")
In [4]: cat.unique()
Out[4]:
component [atm]
frequency [daily]
experiment [20C]
variable [FLNS, FLNSC, FLUT, FSNS, FSNSC]
path [s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNS...
derived_variable []
dtype: object The user can register their derivation function via a decorator. In [5]: @intake_esm.register_derived_variable(varname="FOO", required=[{'variable': "TEMP", "component": "ocn"}])
...: def func(ds):
...: return ds.TEMP + 1
...: The user should be able to validate the derived catalog whenever they want via In [9]: cat.validate_derivedcat()
Looks good! This validation method looks like for key, entry in self.derivedcat.items():
for req in entry.required:
for col in req:
if col not in self.esmcat.df.columns:
raise ValueError(
f"{key} requires {col} to be in the ESM catalog columns: {self.esmcat.df.columns.tolist()}"
)
if self.esmcat.aggregation_control.variable_column_name not in req.keys():
raise ValueError(
f"Variable derivation requires *{self.esmcat.aggregation_control.variable_column_name}* to be in the dictionary of requirements: {req}"
)
else:
print('Looks good!') Operations like In [6]: cat.unique()
Out[6]:
component [atm]
frequency [daily]
experiment [20C]
variable [FLNS, FLNSC, FLUT, FSNS, FSNSC]
path [s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNS...
derived_variable [FOO]
dtype: object
In [8]: cat.derivedcat
Out[8]: {'FOO': DerivedVariable(func=<function func at 0x1072dc310>, required=[{'variable': 'TEMP', 'component': 'ocn'}])}
Cc @matt-long, @kmpaul, @mgrover1 |
Similar to the development in esds-funnel, we think it would be useful to be able to add "derived variables" to a catalog, accessible via an api similar to this:
The text was updated successfully, but these errors were encountered: