-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to a date-based catalog versioning system, and related updates #243
Switch to a date-based catalog versioning system, and related updates #243
Conversation
…st-catalog-version
…st-catalog-version
…linked to other places
…g.yaml (doesn't make much sense)
Running a build test right now - will comment if I have any issues |
@rbeucher It looks like you might have overwritten a default catalog location or similar on Friday? I'm currently getting the following error - are you able to take a look & let me know if the file in the error message is one you recognise? >>> import intake
>>> intake.cat.access_nri
/g/data/hh5/public/apps/miniconda3/envs/analysis3-24.07/lib/python3.10/importlib/__init__.py:126](https://are.nci.org.au/g/data/hh5/public/apps/miniconda3/envs/analysis3-24.07/lib/python3.10/importlib/__init__.py#line=125): RuntimeWarning: Unable to access a default catalog location. Calling intake.cat.access_nri will not work.
return _bootstrap._gcd_import(name[level:], package, level)
access_nri:
args: {}
description: ''
driver: intake.catalog.base.Catalog
metadata: {}
>>> from access_nri_intake.utils import get_catalog_fp
>>> get_catalog_fp()
'/g/data/xp65/public/apps/access-nri-intake-catalog/catalog.yaml'
>>> intake.open_catalog('/g/data/xp65/public/apps/access-nri-intake-catalog/catalog.yaml').access_nri
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[9], line 1
----> 1 intake.open_catalog('[/g/data/xp65/public/apps/access-nri-intake-catalog/catalog.yaml](https://are.nci.org.au/g/data/xp65/public/apps/access-nri-intake-catalog/catalog.yaml)').access_nri
...
FileNotFoundError: [Errno 2] No such file or directory: '[/g/data/tm70/rb5533/intake_tests/v0.1.4](https://are.nci.org.au/g/data/tm70/rb5533/intake_tests/v0.1.4)+13.g869f576.dirty[/metacatalog.csv](https://are.nci.org.au/metacatalog.csv)' I've run a catalog build with just CMIP5 enabled which built without any issues - I'm just running into this slightly weird error trying to import it. |
Yes. Sorry, my bad. It looks like I have mess up with the file |
No worries. I'm looking into it more closely now - probably something we're going to want to have some guard rails against. |
I've restored the file to default to v0.1.3 for now. |
@charles-turner-1 I think you might have to hack the code to open your existing catalog (or, better yet, put it into We should probably look at adding a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've managed to build a catalog subset & I'm happy that this behaves as expected.
I think it might be nice to add a toggle that allows the user to toggle discovery of $HOME/.access_nri_intake_catalog/catalog.yaml
from the python interpreter - rather than requiring the user to rename the file/folder to disable it overriding the default, but perhaps that's for a separate PR?
Sorry @marc-white, just saw your comment - the default catalog was pointing at a nonexistent file. My catalog in I've approved but I think before merging we want to update docs? |
Yes, a docs update is definitely required. I can get started on that today. |
I've added the necessary documentation, take a peek... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the two minor typos, documentation changes look correct to me.
I updated cli.py::build
if update: ...
clause on my local machine to make it a bit more readable to check the docs - are you happy for me to push the changes to the branch?
docs/management/building.rst
Outdated
|
||
:code:`access_nri_intake_catalog` only links a singular :code:`catalog.yaml` to the entry point :code:`intake.cat.access_nri`; either the | ||
user's local version, or if that does not exist, the live version on Gadi (see :ref:`faq`). To load outdated catalogs from Gadi, we recommend | ||
copying the :code:`catalog-<old min version>-<old max version>.yaml` to :code:`~/access_nri_intake_catalog/catalog.yaml`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be ~/.access_nri_intake_catalog/catalog.yaml
, not ~/access_nri_intake_catalog/catalog.yaml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct
docs/management/schema.rst
Outdated
:code:`SCHEMA_HASH`). The easiest way to update this is to first set :code:`SCHEMA_HASH` to :code:`None`. The | ||
updated hash will then be printed to screen when the sub-package is imported and this can be copied and pasted | ||
across. | ||
As of version 0.14, the catalog schema is now a part of the :code:`access_nri_intake_catalog` package, rather |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be version 0.1.4, not 0.14
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, I think there might be a couple of those...
Yep push the changes |
…hub.com/ACCESS-NRI/access-nri-intake-catalog into 191-default-to-a-latest-catalog-version
@marc-white Can you confirm the shorthand variable names I used in 5c80be3 aren't misleading? Specifically Other than that I think this is good to go. |
Looks reasonable to me! |
I'm going to get onto Gadi now and create symlinks for the existing catalog versions, so they get picked up the first time we generate a 'new-style' catalog. |
Closes #191 .
This PR significantly alters the way the catalog generated by
access-nri-intake-catalog
is handled, stored, and versioned.Catalog storage location
The live catalog will now be stored on
gdata/xp65
(access_nri_intake.CATALOG_LOCATION
), rather than being shipped with the package. This is because all of the catalog data lives on Gadi anyway, so there doesn't seem to be much point to being able to see the catalog definition YAML, but not be able to access any data.However, to support power users and local developers, there is the option to place a
catalog.yaml
file at a defined location in the user's home directory (access_nri_intake.USER_CATALOG_LOCATION
). This catalog will automatically take precedence over the 'live' catalog on Gadi, if it exists. I thought about adding utility functions that would allow users to create this directory, put acatalog.yaml
in it, etc.; however, given that it's a power user move, I decided it's easier and safer to have the user do that themselves manually.Versioning
Catalog versions will now be date-based, e.g., a catalog built today will be, by default,
v2024-11-07
. Attempts to set a version number that doesn't conform to this pattern will raise an exception.catalog.yaml
will now contain amin
andmax
version. This is to cover the possibility that the catalog structure may change, and a particular version ofcatalog.yaml
may not be compatible with certain catalog versions. I've confirmed that doing the usual<
and>
operations on our version strings has the expected output.Under this versioning schema, there isn't really a need to have a symlinked
latest
version of the catalog (and also because we can update the live catalog onxp65
at will, without doing a code release).Building the catalog
Because
catalog.yaml
will be placed onxp65
now, there is no need to make a new code release for every catalog build.As mentioned above, a catalog built today will take today's date as the default version, although the user can override it.
The build process is now intelligent to the presence of older versions of
catalog.yaml
, and to directories that look like older catalogs:catalog.yaml
exists, one will be created. If the data directory contains folders that look like catalog versions (i.e. vYYYY-MM-DD), then the code will use those to construct themin
andmax
version boundaries (i.e., it will assume that the newcatalog.yaml
is good for describing those existing sources).catalog.yaml
exists, and there is no structural change to the catalog, then the existingcatalog.yaml
will be updated with:catalog.yaml
).catalog.yaml
will be created, with min version = max version = current/new version. The oldcatalog.yaml
will be moved aside tocatalog-<old min version>-<old max version>.yaml
. These catalogs are nominally not accessible, unless the user hacks theaccess_nri_intake.CATALOG_LOCATION
variable. For now, given how infrequently we'll be making the sort of changes that will trigger this scenario, I'm fine with that.Documentation
To be updated once we're happy with the core structure of this update. The updates required will be:
catalog.yaml
.Testing
All of the above should have at least one unit test.