Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define and document our "release" process #18

Closed
jdhoffa opened this issue Feb 5, 2024 · 28 comments · Fixed by #34
Closed

Define and document our "release" process #18

jdhoffa opened this issue Feb 5, 2024 · 28 comments · Fixed by #34
Labels
community lifecycle Practices pertaining to the lifecycle of code. quarterly

Comments

@jdhoffa
Copy link
Collaborator

jdhoffa commented Feb 5, 2024

          I have a general question to @AlexAxthelm  and @cjyetman: what is our "release" process? 
  • Initially, we would only bump r2dii.* version numbers once they were accepted to CRAN
  • We then entered a situation where we "released" the pacta.* packages when they were used to build a "release candidate" for Transition Monitor. This was more or less just to facilitate "easily" re-building the image based on the same package versions, to be able to simulate, e.g. re-running a COP portfolio.

As far as I can tell, this is now serving neither of the above, but instead the WebApp circumstance, so I guess this perhaps is a call to align on our definitions of a "release", otherwise I fear it ends up being a meaningless term.

NIT: I will extract this into an issue in the practices repo.

Originally posted by @jdhoffa in RMI-PACTA/workflow.portfolio.parsing#17 (comment)

@jdhoffa
Copy link
Collaborator Author

jdhoffa commented Feb 5, 2024

Relates to #6 and #7

@jdhoffa jdhoffa added lifecycle Practices pertaining to the lifecycle of code. quarterly community labels Feb 5, 2024
@AlexAxthelm
Copy link

🤔 My inclination (for R packages, at least) would be to do a version bump on any merge into main, since that would work nicely with pak being able to pick the right/latest version and bypass a lot of tricks involving feature flags.

# DESCRIPTION
Imports: 
    pacta.portfolio.import (>= 0.0.3),
    uuid
Remotes:
    RMI-PACTA/pacta.portfolio.import

On the other hand, that's probably not super compatible with our current merge flow. Bumping versions here could get very busy, since we normally develop with features merging directly into main (rather than to a develop branch).

@jdhoffa
Copy link
Collaborator Author

jdhoffa commented Feb 5, 2024

This is another huge diversion from our existing development practices, and not something I imagine neither @cjyetman nor myself am likely to agree with without justification.

This also begs the question: does the individual maintainer have autonomy to decide how each repository manages their own git strategy and release process individually? I would be inclined to say "no" here as I think that unnecessarily introduces complexity into our team, but something we should absolutely discuss and document here: #12

However, directly to your point, I think the following options exist (not necessarily comprehensive, and focused to R packages):

My strong preference here is for option 1, as it limits any unnecessary changes in existing paradigm, while offering on-par features (maybe not with what you mention around pak).

This will be discussed collectively in a DevHangout.

@cjyetman
Copy link
Collaborator

cjyetman commented Feb 5, 2024

Afraid to begin this debate again, but... on the P4I side, where we have not yet released anything on CRAN, I believe the Git hash is the safest identifier of "version" we can use. See https://github.com/RMI-PACTA/pacta.data.preparation/issues/99#issuecomment-1239647545

@jdhoffa
Copy link
Collaborator Author

jdhoffa commented Feb 5, 2024

I don't think we should be afraid of this debate, I think it's crucially important that we get to the bottom of it in fact and come to some sort of agreement. No more kicking the can down the road.

@jdhoffa
Copy link
Collaborator Author

jdhoffa commented Feb 5, 2024

Afraid to begin this debate again, but... on the P4I side, where we have not yet released anything on CRAN, I believe the Git hash is the safest identifier of "version" we can use. See RMI-PACTA/pacta.data.preparation#99 (comment)

Directly to your point, I think git hashes are adequate while these packages remain internal only (sorry, I know I didn't think this in the past, my thinking on this has evolved since then). However, I imagine versioned releases (SemVer or otherwise) may be useful if we ever intend to communicate the status and changes of packages to anyone external who may be using the code not under our supervision.

In particular, strongly sticking to SemVer can help users know and understand when an update may introduce API changes that may break their downstream workflows. If we rely only on git-hashes, and the latest version on main we reduce that flexibility and require them to constantly use the "bleeding edge" version, even if it screws up their entire workflow.

I know this isn't currently the case, but it maybe worth considering in our discussion, as it could be a state we may want to aim for.

@AlexAxthelm
Copy link

Worth noting that most GH-based workflows (including GH-flow, and pak downloading) make an assumption that what is on the default branch (for us, main) is the latest working version. Not a dev version.

If we want to batch multiple PRs into a single "release" (however we want to specify that), then we need somewhere (another branch?) to do that, and fthen once we're happy with that, we can merge them into the default branch.

@jdhoffa
Copy link
Collaborator Author

jdhoffa commented Feb 5, 2024

Yup totally understood. The real question is do we want that place to be main or somewhere else (CRAN/ R-Universe/ somewhere else/ docker hub?)

I have my reservations only because of the current workflows of the team, and trying not to shake too much up at once but as always happy to be convinced otherwise!

@jdhoffa
Copy link
Collaborator Author

jdhoffa commented Feb 5, 2024

And just to be very clear, I definitely don't think everything needs to/ should live on CRAN, I see the immense pain in the ass that could be.

There's a specific reason that the banks packages do/ should, but I don't think we need to force everything to.

@cjyetman
Copy link
Collaborator

cjyetman commented Feb 6, 2024

definitely agree, for communicating "version" to external users we should use semver, but I think that needs to be coupled with a CRAN release, or something that forces the semver to be immutable... I don't think the semver living in the DESCRIPTION file alone is adequate, at least not in our typical git flow

I also agree that a git flow that had experimental dev branches for PRs and such, a (stable-ish) dev branch for the bleeding edge, and a stable main/release branch for expected external use would be ideal, but... that's substantially different than our typical git flows, and I'm not sure it's worth rocking that boat. Additionally, I'm not aware of any strong examples of R packages that follow a flow like that, so we'd be deviating from some expected norms.

@jdhoffa
Copy link
Collaborator Author

jdhoffa commented Feb 6, 2024

  • I think enforcing SemVer only when paired with a legitimate release (e.g. on CRAN) is a totally reasonable expectation
  • I share the hesitations with git-flow, for the same reason, but not so against it that I wouldn't consider it. But also don't really feel it's any kind of priority

A question for @cjyetman does R-Universe have any mechanism for enforcing SemVer?

@cjyetman
Copy link
Collaborator

cjyetman commented Feb 6, 2024

A question for @cjyetman does R-Universe have any mechanism for enforcing SemVer?

Not that I'm aware of. As I understand, it is pointed to a branch/tag/hash, and whenever a change is made there it automatically rebuilds. I suspect if a fancy git flow was used, one could enable two parallel r-universes, one for stable/main and one for dev/bleeding-edge. I believe that in any case, the reported "version" on r-universe will be whatever is stated in the DESCRIPTION file (in the associated target).

@jdhoffa
Copy link
Collaborator Author

jdhoffa commented Feb 6, 2024

Got it.
So for our R packages, if we intend to deviate from CRAN AND do some kind of SemVer, we would need to keep track of it and enforce it entirely ourselves.

@cjyetman
Copy link
Collaborator

cjyetman commented Feb 6, 2024

Worth noting that most GH-based workflows (including GH-flow, and pak downloading) make an assumption that what is on the default branch (for us, main) is the latest working version. Not a dev version.

a bit nit-picky, but just to be precise, pak makes a default assumption that you want the default branch of a repo, but it can be pointed at other branches/tags/PRs/hashes/etc.

@cjyetman
Copy link
Collaborator

cjyetman commented Feb 6, 2024

I also agree that a git flow that had experimental dev branches for PRs and such, a (stable-ish) dev branch for the bleeding edge, and a stable main/release branch for expected external use would be ideal, but... that's substantially different than our typical git flows, and I'm not sure it's worth rocking that boat. Additionally, I'm not aware of any strong examples of R packages that follow a flow like that, so we'd be deviating from some expected norms.

also importantly in this case, I would expect a strict rule (if not a GH Action that also enforces it) that any push to dev or main would require a version bump (in DESCRIPTION)

@cjyetman
Copy link
Collaborator

cjyetman commented Feb 6, 2024

Got it. So for our R packages, if we intend to deviate from CRAN AND do some kind of SemVer, we would need to keep track of it and enforce it entirely ourselves.

yes, I believe so

@jdhoffa
Copy link
Collaborator Author

jdhoffa commented Feb 6, 2024

I also agree that a git flow that had experimental dev branches for PRs and such, a (stable-ish) dev branch for the bleeding edge, and a stable main/release branch for expected external use would be ideal, but... that's substantially different than our typical git flows, and I'm not sure it's worth rocking that boat. Additionally, I'm not aware of any strong examples of R packages that follow a flow like that, so we'd be deviating from some expected norms.

also importantly in this case, I would expect a strict rule (if not a GH Action that also enforces it) that any push to dev or main would require a version bump (in DESCRIPTION)

hmm NIT: but just to be precise, I would expect a version bump on push to main, but not on push to develop?
In my mental model, if we use what we do in r2dii.analysis as an example, then the switch from:
GitHub flow + CRAN -> Git-Flow + GH
would effectively correspond to a switch from:
main as we know it now would become develop
CRAN as we know it now would become main

@jdhoffa
Copy link
Collaborator Author

jdhoffa commented Feb 6, 2024

EDITED to include @cjyetman and @AlexAxthelm 's points:

Ah woops, I confused myself. We would still have to have the version bump on develop at some point, just the precise logic would differ.

On develop we would enforce that:

  • Version in DESCRIPTION must follow an x.y.z OR x.y.z.9000 format
    AND
  • Version in any commit must be > the version before commit

Then on main we would enforce that:

  • Version in DESCRIPTION must x.y.z format
    AND
  • Version in DESCRIPTION for any commit must be >= the version before commit
    AND
  • Version in DESCRIPTION for any commit must be > the version on main

That seems pretty doable in an action.

@AlexAxthelm
Copy link

@jdhoffa , on main, version in commit is > than version in prior commit (not >=).

The general git-flow paradigm would be something like

  1. get develop into something feature-complete for the release (no more application changes)
  2. open a release/0.1.0 branch
  3. Do any housekeeping and version bumps on release (updating news, devtools::document(), etc)
  4. merge release/* into main and develop

@cjyetman
Copy link
Collaborator

cjyetman commented Feb 6, 2024

Ah woops, I confused myself. Obviously, we would still have to have the version bump on develop at some point, just the precise logic would differ.

On develop we would enforce that:

  • Version in DESCRIPTION must follow an x.y.z OR x.y.z.9000 format
    AND
  • Version in any commit must be >= the version prior to commit

Then on main we would enforce that:

  • Version in DESCRIPTION must x.y.z format
    AND
  • Version in any commit must be >= the version prior to commit

That seems pretty doable in an action?

exactly, what I was thinking... though there could be an additional check to enforce that dev version x.y.z.9000 is higher than current version on main too

@jdhoffa
Copy link
Collaborator Author

jdhoffa commented Feb 6, 2024

@jdhoffa , on main, version in commit is > than version in prior commit (not >=).

The general git-flow paradigm would be something like

  1. get develop into something feature-complete for the release (no more application changes)
  2. open a release/0.1.0 branch
  3. Do any housekeeping and version bumps on release (updating news, devtools::document(), etc)
  4. merge release/* into main and develop

Got it.

@jdhoffa
Copy link
Collaborator Author

jdhoffa commented Feb 8, 2024

As per a comment from @AlexAxthelm

  • Once we have discussed and agreed on this, there should be an addition also to the maintainer.md something along the lines of:
## Versioning and Release
Maintainers are responsible for communicating overall application version through some informative version identifier, as well as determining the suitibility of any release candidate and any release schedule

@jdhoffa
Copy link
Collaborator Author

jdhoffa commented Feb 9, 2024

FYI: ADO ticket to gather user feedback from Banks to assess if we can explore alternatives to CRAN:
https://dev.azure.com/RMI-PACTA/2DegreesInvesting/_workitems/edit/9726

@jdhoffa
Copy link
Collaborator Author

jdhoffa commented Feb 22, 2024

Good comment from @AlexAxthelm

hash communicates about state
version communicates about function

@jdhoffa
Copy link
Collaborator Author

jdhoffa commented Feb 23, 2024

We had a devHangouts discussion about this. Below is a summary of the transcript of that meeting:

Versioning and Release Strategy

We discussed the purpose of releases and versioning, and the differences of semantic versioning and calendar versioning or simply using Git hashes for identifying specific states of the code. @AlexAxthelm and @cjyetman both noted that "hashes communicate about state, version communicates about function".

Internal Communication and Consistency

The importance of clear communication within the team about versioning and repository states is stressed, with the idea that consistent practices across the team are crucial for efficiency and understanding.

SemVer for R Packages

For our "vanilla R packages", e.g. those with the pacta.* or r2dii.* prefixes, it was agreed upon that implementing a Semantic Versioning was a good approach.

Other types of code

For other software bases (e.g. Transition Monitor Dockerfile, other workflows, data) it was less clear how versioning should be done, or if it was even necessary to define it. For the Transition Monitor Dockerfile, it was stated that some consistent tagging system would be useful (e.g. tag relevant images with date-built, ch 2024, and other useful developer facing tags to help us sleuth errors)

Potential Use of CRAN

The use of CRAN (Comprehensive R Archive Network) for package release is mentioned, with a consideration of whether it is necessary for their workflows or if there could be alternative ways to distribute packages to banks or other users. Note: this discussion seemed to stray a bit from the "release" discussion.

@jdhoffa
Copy link
Collaborator Author

jdhoffa commented Feb 23, 2024

Of this, the only thing I think that deserves to go into practices is that we will use SemVer consistently for our vanilla R packages. I will follow up with a PR.

@AlexAxthelm
Copy link

Of this, the only thing I think that deserves to go into practices is that we will use SemVer consistently for our vanilla R packages. I will follow up with a PR.

As I noted int he meeting, I think that this should be "we will use SemVer mostly consistently". Some packages, such as pacta.scenario.data.preparation would be well served to have a calendar versioning, since those do have a strong time component in them.

using a version such as 2023.0.2 is technically SemVer compliant, but doesn't match the sprit of semver.

@jdhoffa
Copy link
Collaborator Author

jdhoffa commented Feb 23, 2024

I didn't totally understand that point in the meeting, but sounds cool and understood!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community lifecycle Practices pertaining to the lifecycle of code. quarterly
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants