Replies: 6 comments 10 replies
-
Very cool that you all are thinking about versioning! AE has a similar thread here which also mentions semver. Given that you all seem to be interested in including dates, you might be interested in CalVer? I came across it when I was researching versioning strategies. It seems to be a new "standard" than semver but has some interested advantages for folks that work on set release schedules. |
Beta Was this translation helpful? Give feedback.
-
there may be a useful built-in python library for parsing/comparing versions: >>> from packaging import version
>>> version.parse('2021.01.31') >= version.parse('2021.01.30.dev1')
True
>>> version.parse('2021.01.31.0012') >= version.parse('2021.01.31.1012')
False |
Beta Was this translation helpful? Give feedback.
-
in the summary, I noted that PLUTO could use a version scheme called we probably don't wanna make such a disruptive change to PLUTO, so we could have a more specific version scheme called
PLUTO
|
Beta Was this translation helpful? Give feedback.
-
Nice write-up. Excited standardizing versions. Though I'm a bit unclear which products would change their version schemas as the result of this discussion... |
Beta Was this translation helpful? Give feedback.
-
just tried to diagram what the version logic during flowchart LR
plan_build[Plan Build]
get_latest_vers["get_latest_vers()"]
bump_latest_vers["bump_latest_vers()"]
bump_patch["bump_patch()"]
load_data[Data Loading]
q_prev_declared{prev. version\ndeclared?}
q_vers_declared{version\ndeclared?}
q_is_patch{is a\npatch?}
plan_build --> q_prev_declared
q_prev_declared -->|Yes| q_vers_declared
q_prev_declared -->|No| get_latest_vers
get_latest_vers --> q_vers_declared
q_vers_declared -->|Yes| q_is_patch
q_vers_declared -->|No| bump_latest_vers
bump_latest_vers --> load_data
q_is_patch -->|Yes| bump_patch
bump_patch --> load_data
q_is_patch -->|No| load_data
this logic might lead to the following recipe fields:
|
Beta Was this translation helpful? Give feedback.
-
while looking at the DCAT-US metadata schema here, I found standard values for release frequency called "ISO 8601 Repeating Duration" here |
Beta Was this translation helpful? Give feedback.
-
Current state
DE code
In
dcpy/utils/versions.py
we declare the version types that a data product can have. I'd think about these as "release" or "publish" version types to make it clear that we're talking about how we enumerate the data we share.The current version types are:
MajorMinor
,Quarter
,Date
,FirstOfMonth
.We also use a field in our product recipes called
version_strategy
to allow automated generation of the release version of a build.The current version strategies are:
first_of_month
,bump_latest_release
,bump_latest_release(int)
DE data products
The release schedule for our data products is in the DE Data Catalog excel file in SharePoint. Some notable version schemes used by our data products:
20240617
20240501
22Q2
24v2.1
24prelim
Problems
PR #973 was a temporary fix to a problem related to the
FirstOfMonth
version and surfaced a few concerns we have about the current state:FirstOfMonth
andDate
is unsustainable. If we publish a dataset on 5/1, does that mean we always publish on the first of a month?lifecycle.builds.plan
, we determine the latest published version by parsing and sorting folder names.2023-09-01, 24v1, 2024-01-01, 2024-02-01
)Proposed changes
Our version types should describe how frequently we publish a data product, be fully distinguishable, and be sortable to reflect the order in which releases were published.
we use
Calendar Versioning
to construct version schemesAll dataset versions use a calendar versioning scheme inspired by the Calver conventions. We use a combination of date segments and incremental segments to construct a version scheme for every data product.
These are the segments we use:
0Y
- Zero-padded year - 06, 16, 24FY0Y
- Zero-padded fiscal year - FY06, FY24, FY25Q
- Quarter - Q1, Q2, Q3, Q40M
- Zero-padded month - 01, 02 ... 11, 120D
- Zero-padded day - 01, 02 ... 30, 310W
- Zero-padded week - 01, 02, 33, 52MAJOR
- An incrementMINOR
- An incrementPATCH
- An incrementMODIFIER
- An optional text tag, such as "dev", "alpha", "prelim", "exec"We use these segments to construct version schemes. Each data product uses a version scheme. Multiple products may use the same scheme. For example:
Quarterly
0YQ
Monthly
0Y0M
Semantic
0Y.MAJOR.MINOR.PATCH
PLUTO
Each data product recipe file has a default number of increments to bump a new version. This is a way to declare the release schedule of a data product. During a build, the release version is either automatically determined using the latest published version + default bump or manually declared via github action input. For example:
DevDB and FacDB
Quarterly
2
24Q1
,24Q3
PLUTO
1, major
inrecipe.yml
and1, minor
inrecipe-minor.yml
24v1
,24v1.1
,24v1.2
,24v2
we don't change the versions of data products
We don't need to change the versions we currently use for data products to improve how we represent and handle them internally. e.g. PLUTO can still have a
v
in it, CPDB doesn't have to change to use the fiscal year in it's version, etc.BUT these are changes we may decide are worth making later and the changes proposed here have that in mind.
Beta Was this translation helpful? Give feedback.
All reactions