-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revamp GitHub tag versions, debugging individual projects, and more! #237
base: master
Are you sure you want to change the base?
Conversation
Just catching up on this now. Very cool! I think the optional regex transform per project makes a lot of sense. Definitely miles better, no offense taken. To your TODO questions: We're looking far beyond the Python ecosystem, and I'd expect PEP440 is probably too strict. The schema you have with match/replace/remove is fine, but instead of passing it to PEP440, we can say, if that string starts with 0, then the project is zerover. Versions that don't match the initial regex are ignored. We only log a failure if no releases match (the regex or URL is probably wrong). This should definitely help with the huge increase in monorepos (architecturally good imo), but complicates tagging. Lots of In terms of release count, I'm fine merging/ignoring suffixed releases (dev/pre/post) with their equivalent numeric release. |
Thanks for the input! |
I understand and agree. The trouble is that 99% of releases are PEP 440 compliant after regex parsing. The only one that is not is |
Hmm, this is proving challenging since I based most of the logic around the use of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea for the regex. especially given how many projects we're already tracking, the chances one of them adds a new suffix is quite high, and maintaining the list would get pretty involved.
from packaging.version import InvalidVersion, Version | ||
|
||
|
||
class RegexSubstituionDict(TypedDict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RegexSubstituionDict
-> RegexSubstitutionDict
return True | ||
|
||
def process_name(self, regex_subs: list[RegexSubstituionDict] | None = None): | ||
for sub in regex_subs or []: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to handle the issue of suffixes, the first thing that comes to mind for me is to just treat the search regex as a prefix that must match, and we can append our own suffix portion to the configured regex. something to the effect of [^\d].*
. And we just snip off all suffixes after the last matched part. Do you think that would work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try! Regex is one of those things that can be easier to do than say. :)
So if I recall the way version stuff currently works is that I have an ignore (or skip?) step and a strip step. One matched tags we don't want, and the other cleaned up versions we did.
Ideally we could just have one kind of regex that matched and extracted. We can start with a really general default, something like: ^\D*(\d+(?:\D\d+)+)\D*
This already works for simpler cases like Julia that require a tag_regex_match
to remove suffixes now.
Of course, this runs into issues with, e.g., hashicorp vault, which has tagging for subcomponents. We could make every entry with a static prefix revert to regex, but I'd suggest:
- name: HashiCorp Vault
gh_url: https://github.com/hashicorp/vault
emeritus: true
tag_match:
- prefix: v
So we build the regex for the project: v(\d+(?:\D\d+)+)\D*
For cases like stellarium which have two version formats, I'm thinking something like:
- name: Stellarium
gh_url: https://github.com/Stellarium/stellarium
tag_match:
- prefix: v
- regex: stellarium-(\d+-\d+-\d+)
And then for the second pattern, we stick the \D* on at the end. So the second regex would be ^stellarium-(\d+-\d+-\d+)\D*
. We always tack on the \D*
and pull the first group from re.match
, which has the effect of dropping suffixes. And if the first character of any match is 0
, that's 0ver. If it doesn't match, we try other regexes. For the purposes of release counting and assessing whether the project is currently 0ver, we only look at releases that match a regex.
I think this gives us a pretty robust mechanism. Ideally one where we won't be in regexland every other day because some project decided to get cute with their tags :) lmk what you think!
Also for Cataclysm in particular, I say we just kick it over to being manual. :P |
Closes #236
TODO
Cataclysm: Dark Days Ahead
's releases, like0.H
, which are not PEP 440 compliant.README.md
.projects.json
.Summary
This PR includes the following changes and features:
tools\gen_projects_json.py
.tools\gen_projects_json.py
using the PEP 440 compliant standard and custom regex on a per-project basis.tools\gen_projects_json.py
forgenerate
,info
, andtags
.projects.yaml
to not require nearly as much manual input.Tag parsing
Tag parsing is much more generic now with the specific implementations moved to entries in the
projects.yaml
file. The base version simply trys to cast the version string as a PEP 440 compliantVersion
object (frompackaging.version
). If it fails, the tag is ignored for all aspects of the data generation (first, latest, count, etc.). Many (maybe like 1/3) repositories use non-compliant tag names. To solve this, each project can define custom regexs to apply to tags. For example, the Vala project uses tags that look like this:VALA_0_0_0
. The updated entry adds a custom regex to convert this to a compliant version:Additionally, many projects use tag name prefixes. For example, the StreamEx project uses versions that look like
streamex-0.8.3
. To fix this, simply remove the prefix with this regex:This system feels miles better (no offense intended). It also simplifies the code quite a bit and enables automatic parsing of many libraries and data that were previously not possible (React, FreeCAD, Haskell bytestring, OpenSSL, MAME, Window Maker, ReactOS, three.js, google-api-client, rand, distlib, etc.).
gen_projects_json.py CLI
gen_projects_json.py --help
Generate
gen_projects_json.py generate --help
This command did not change.
Info
gen_projects_json.py info --help
The info command allows you to view what would be written to
projects.json
for the specified project. You can pass either a GitHub link or the exact name of a entry inprojects.yaml
. It will then print the output that would be written toprojects.json
for easier debugging.Tags
gen_projects_json.py tags --help
This command is super helpful with the new tagging system for building and testing regexs. When adding a new library simply pass the GitHub address to see if the tags are not compliant (requiring a regex). From there you will be able to see every parsed version, duplicate version (due to improper regex patterns), and failed version. Here is a demonstration output:
Here is an example output for a more complicated example:
In this second example we can see a duplicate tag, which is fine in this case since there are actually two tags with the same version.