Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality for derived variables #379

Merged
merged 10 commits into from
Oct 15, 2021
Merged

Add functionality for derived variables #379

merged 10 commits into from
Oct 15, 2021

Conversation

andersy005
Copy link
Member

@andersy005 andersy005 commented Oct 14, 2021

Change Summary

  • Adds derived.py module. This module houses data classes used for derived variables
  • Adds derivedcat attribute on the main catalog object
  • Adapts the search() method to both the base/main and derived catalogs
  • Adapt to_dataset_dict() for derived variables

Related issue number

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable

@andersy005 andersy005 added this to the Winter 2021 Release milestone Oct 14, 2021
@andersy005 andersy005 added the enhancement Issues that are found to be a reasonable candidate feature additions label Oct 14, 2021
@andersy005 andersy005 marked this pull request as ready for review October 14, 2021 23:27
@andersy005
Copy link
Member Author

andersy005 commented Oct 14, 2021

This seems to be working quite well:

  • Create a local registry
In [1]: import intake

In [2]: import intake_esm

In [3]: registry = intake_esm.DerivedVariableRegistry()

In [4]: @registry.register(variable='FOO', dependent_variables=['FLNS', 'FLUT'])
   ...: def func(ds):
   ...:     ds['FOO'] = ds.FLNS + ds.FLUT
   ...:     return ds
   ...: 
   ...: @registry.register(variable='BAR', dependent_variables=['FLUT'])
   ...: def funcs(ds):
   ...:     ds['BAR'] = ds.FLUT * 1000
   ...:     return ds
   ...: 
  • Instantiate a catalog object
In [5]: cat = intake.open_esm_datastore("./tests/sample-collections/catalog-dict-records.json", registry=registry)

In [11]: cat.df
Out[11]: 
  component frequency experiment variable                                               path
0       atm     daily        20C     FLNS  s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNS....
1       atm     daily        20C    FLNSC  s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNSC...
2       atm     daily        20C     FLUT  s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLUT....
3       atm     daily        20C     FSNS  s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FSNS....
4       atm     daily        20C    FSNSC  s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FSNSC...
  • For demo purposes, search for derived variables FOO and BAR only
In [6]: new_cat = cat.search(variable=['FOO', 'BAR'])

In [7]: new_cat
Out[7]: <aws-cesm1-le catalog with 1 dataset(s) from 2 asset(s)>
  • Load data into xarray
In [8]: ds = new_cat.to_dataset_dict(xarray_open_kwargs={'backend_kwargs': {'storage_options': {'anon': True}}})

--> The keys in the returned dictionary of datasets are constructed as follows:
        'component.experiment.frequency'
 |████████████████████████████████████████████████████████████████████████████████| 100.00% [1/1 00:00<00:00]
  • FOO and BAR are included in our datasets 🎉
In [9]: ds['atm.20C.daily']
Out[9]: 
<xarray.Dataset>
Dimensions:    (member_id: 40, time: 31390, lat: 192, lon: 288, nbnd: 2)
Coordinates:
  * lat        (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
  * lon        (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
  * member_id  (member_id) int64 1 2 3 4 5 6 7 8 ... 34 35 101 102 103 104 105
  * time       (time) object 1920-01-01 12:00:00 ... 2005-12-31 12:00:00
    time_bnds  (time, nbnd) object dask.array<chunksize=(15695, 2), meta=np.ndarray>
Dimensions without coordinates: nbnd
Data variables:
    FLNS       (member_id, time, lat, lon) float32 dask.array<chunksize=(1, 576, 192, 288), meta=np.ndarray>
    FLUT       (member_id, time, lat, lon) float32 dask.array<chunksize=(1, 576, 192, 288), meta=np.ndarray>
    FOO        (member_id, time, lat, lon) float32 dask.array<chunksize=(1, 576, 192, 288), meta=np.ndarray>
    BAR        (member_id, time, lat, lon) float32 dask.array<chunksize=(1, 576, 192, 288), meta=np.ndarray>
Attributes: (12/15)
    Conventions:                  CF-1.0
    NCO:                          4.4.2
    Version:                      $Name$
    important_note:               This data is part of the project 'Blind Eva...
    initial_file:                 b.e11.B20TRC5CNBDRD.f09_g16.001.cam.i.1920-...
    logname:                      mudryk
    ...                           ...
    title:                        UNSET
    topography_file:              /scratch/p/pjk/mudryk/cesm1_1_2_LENS/inputd...
    intake_esm_attrs/component:   atm
    intake_esm_attrs/frequency:   daily
    intake_esm_attrs/experiment:  20C
    intake_esm_dataset_key:       atm.20C.daily

Cc @kmpaul, @mgrover1, @matt-long... I'm going to merge this tomorrow unless there's any objection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issues that are found to be a reasonable candidate feature additions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant