Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move CI to GitHub Actions #321

Merged
merged 35 commits into from
Dec 17, 2024
Merged

Move CI to GitHub Actions #321

merged 35 commits into from
Dec 17, 2024

Conversation

ajbozarth
Copy link
Member

@ajbozarth ajbozarth commented Nov 20, 2024

PR for moving our CI from CircleCI to GitHub Actions. This work includes decoupling build and test steps from the push step as well as support multi-platform builds.

Implementation notes:

  • Each demo has it's own workflow
  • A workflow will only run when code specific to that demo is changed
  • ghcr.io is used for build and test steps to remove reliance on DockerHub credentials in forks
    • DockerHub auth will occur only when push is run
  • Build and test steps can also run against liboqs and oqs-provider main branches by manually triggering a workflow and setting the build_main flag to true.
  • A Run All workflow is included that triggers all the other workflows, this will be useful for times when all the CI need to be manually triggered, such a for a release or to test against lliboqs main.
    • I will be adding workflows to liboqs and oqs-provider to trigger this as followup work
  • The latest image will be pushed to DockerHub and ghcr.io when a workflow is run on the main branch, when not a fork, not a PR, and not building against liboqs main branch
    • The latest image is actually a manifest that points to new platform-based images latest-x86_64 and latest-arm64
    • If a build is manually triggered, the input value release_tag can be set that will replace latest in the above naming convention.
  • linux.yml has been deleted and quic.yml has been update to follow the above push rules, but otherwise current workflows have not been changed

Fixes #294 Fixes #213 Fixes #284

@ajbozarth ajbozarth self-assigned this Nov 20, 2024
@ajbozarth
Copy link
Member Author

ajbozarth commented Nov 20, 2024

Note that I have only opened this draft to allow for iterative feedback. It is far from complete and the current workflow is just my first working example (or MVP for agile folks).

I decided having an open draft would be ok once I discovered I could trigger workflows from local yaml files against branches on my fork via VSCode without needing to push commits. (edit: not actually true, so I'm pushing to a test branch while testing, then making a commit here once I know a workflow is ready)

@ajbozarth ajbozarth mentioned this pull request Nov 20, 2024
also update openssl3 workflow to only use the available 4 cores

Signed-off-by: Alex Bozarth <[email protected]>
Signed-off-by: Alex Bozarth <[email protected]>
move the build against liboqs/oqsprovider matrix builds
to a triggerable option not run automatically

Signed-off-by: Alex Bozarth <[email protected]>
@ajbozarth
Copy link
Member Author

Final update before I go on vacation til Monday. I added workflows for curl and httpd and moved the build against liboqs main from a matrix build to a trigger option. I've updated the description and previous commits to clarify my latest changes.

@hayyaaf
Copy link
Contributor

hayyaaf commented Nov 25, 2024

@ajbozarth I would like to add a linter for Docker files, such as Hadolint or should we rely on MegaLinter instead? What do you think?

@hayyaaf
Copy link
Contributor

hayyaaf commented Nov 25, 2024

I have noticed that some GitHub Actions are not using the latest versions, and I also observed the use of a fixed value with the -j flag. Wouldn’t it be better to rely on -j$(nproc) instead? What do you think?

@ajbozarth
Copy link
Member Author

I would like to add a linter for Docker files, such as Hadolint or should we rely on MegaLinter instead? What do you think?

This is a great idea, I'm not sure on which linter to use, I've never linted Dockerfiles before. Creating a linter GitHub Action would be a great task you can do in parallel to this PR. You would most likely need to do a lot of small updates to get the linter to pass initially though.

I have noticed that some GitHub Actions are not using the latest versions

How so? The actions I added check out and run the latest code locally, there's currently no versioning at all.

Wouldn’t it be better to rely on -j$(nproc) instead

This is something I'm aware of, on CircleCI it was being hard-coded to -j 18 and had a comment on the number of available cores in circle CI being 35. I tried to set it to -j$(nproc) in my workflows but it cause a string escape issue so I instead set it to -j4 since GitHub Actions only has 4 available cores. I plan to go back and see if I can get -j$(nproc) to work, but I didn't want that small issue to block my overall progress too long, so I set it aside for now.

@ajbozarth
Copy link
Member Author

A note here on my most recent update:

I had previously enabled triggering each workflow with a flag to use liboqs/oqsprovider main branches, I have moved this to a callable action (meaning it can be triggered by another workflow but not manually triggered). Instead I have created build.yml which is both trigger able and callable and runs all the workflows with the given input flag. This will enable the most common use case: letting liboqs or oqsprovider to remotely trigger a build and test of the demos using the latest code in their own workflows.

Signed-off-by: Alex Bozarth <[email protected]>
Signed-off-by: Alex Bozarth <[email protected]>
note this workflow does not include tests

Signed-off-by: Alex Bozarth <[email protected]>
note this workflow does not include tests

Signed-off-by: Alex Bozarth <[email protected]>
@ajbozarth
Copy link
Member Author

I just pushed a few more workflows, included in them are locust and wireshark, both of which are just build and have no test step:

  • In the case of wireshark this was how it was already in circleCI since it's a UI demo and doesn't have a way to run without X11. @hayyaaf if you disagree with this and have an idea on a test command feel free to chime in as the new maintainer, and I can add it.

  • For locust though I am unsure if we want to include a test step, I read through the README and USAGE files and all the "check your work" instructions seem to rely on a UI. @davidgca as the maintainer of locust is there a command or set of commands you recommend using for the test step? Feel free to look at the other workflows I added in this PR for examples of how we test other demos. Including CI tests is left to your discretion as the maintainer.

Signed-off-by: Alex Bozarth <[email protected]>
Signed-off-by: Alex Bozarth <[email protected]>
Signed-off-by: Alex Bozarth <[email protected]>
Signed-off-by: Alex Bozarth <[email protected]>
Signed-off-by: Alex Bozarth <[email protected]>
@ajbozarth
Copy link
Member Author

I'm moving this out of draft since it is now in a "complete" state, that is it is fully functional for building and testing all our current demos. I still have quite a bit of work left to do as detailed in the following list, but this could be merge as is and further work done in a follow up PR if desired.

To Do's:

  • Add a step (or job) to each workflow that will publish the docker images to docker hub and ghcr.io
    • I have a few ideas on how to handle this and am still brainstorming the best way to add "push" to the CI, ideas I'm thinking about include:
      • should images be pushed automatically on every commit to main, this would occur only if the specific demo is edited. (Note this step would fail on main on forks due to perms, or if the user has configured their perms would cause an attempted push -> this could be addressed in the workflow, but I would need to find out how to distinguish a fork)
      • should we do "releases" using the Run All workflow? If so should the release tag be set as a workflow input? similarly should we enable releases for specific demos? (this would come automatically if we do so via the run all workflow)
      • should we ever release the images based on liboqs/oqsprovider main? if not should we specifically prevent this in the release workflow step?
  • Clean up current CI, I need to take a look at the current CI files and decide what to do with them. Some current thoughts on each workflow:
    • linux.yml this was a great launching off point by Michael, but I believe once I'm done we will just delete this file
    • quic.yml I'm torn on how to handle this workflow, it is really well written, but it handles the build/test/push for the two quic demos differently than how I handled the rest. No matter how it's handled I will need to update how it deals with push credentials.
    • docker-scan.yml This was a great add and I don't believe it will need to be touched, but we may want to consider updating it to run against PRs (which would require dealing with credential issues)
  • Follow up work potentially outside this PR:
    • Turn off CircleCI and remove related files
    • Add workflows to liboqs and oqsprovider that trigger the Run All workflow against their main branches. This would enable leveraging our demos as "tests" for those repos
    • Clean up old (and possibly broken) images remaining on DockerHub

My planned next step is to work on the adding push to the workflows, but with US Thanksgiving this week I might not get to it until next week

@ajbozarth ajbozarth marked this pull request as ready for review November 25, 2024 23:32
@ajbozarth ajbozarth added the enhancement New feature or request label Nov 25, 2024
Signed-off-by: Alex Bozarth <[email protected]>
@ajbozarth
Copy link
Member Author

Based on @baentsch comments I realized I missed the real takeaway from @SWilson4 example: the runners. After a bit of research I found why I had not previously considered that solution: GitHub does not currently provide linux/arm64 runners.

After some more research and testing I was able to update the workflows to leverage runners instead of QEMU. This was only possible due to the existence of the oqs-arm64 self-hosted runner.

I also update the push to tag the images latest-x86_64 or latest-arm64 and then created a followup job to create and push a manifest connecting the two images to latest (or whatever tag was inputed by a manual call). In this case I was actually able to split out the job into a separate workflow, something that I was unable to do with earlier steps due to minor differences between demos.

Implementation Note:

For the new push job I wanted the if check at the top level so it would skip the job when not pushing rather than "running" the job and not doing anything. The downside is that at the top level the env is not accessible, so I had to duplicate the value of env.push

Potential security risks:

While researching runners I ran into the issue that forks do not have access to self-hosted runners, meaning that when the workflows run on a forks main branch they will never start the oqs-arm64 runner job and should timeout after 6 hours. It will still run on a PR though, but the docs warn of security issues depending on how the self-hosted runner is set up. Since I have no access to the runner (I only know it exists because other workflows reference it) I can't tell if this is a concern.
It is worth noting that GitHub plans to make arm64 runners available in Q1, which would replace the self-hosted runner and any potential security issues it has.

Remaining ToDo

  • Add documentation

Remaining Open Discussion:

@ajbozarth
Copy link
Member Author

Small follow up to the above comment:

  • Looks like the openvpn test script is a bit finicky and sometime fails, this should be address in a follow up PR since it would need to be fixed in the script not the workflow.
  • This PR nearly fixes Automate and streamline docker image generation #284 by adding automatic releases when updates are pushed to main and enabling triggering of version releases, but it does not add automated releases of versioned or latest images upon releases of upstream projects (liboqs/oqsprovider). I would suggest that be handled upstream, or manually triggered. I can add Automate and streamline docker image generation #284 to the "Fixes" list if everyone thinks this dresses it enough.

@hayyaaf
Copy link
Contributor

hayyaaf commented Dec 9, 2024

Hi @ajbozarth,

Thank you for your patience, and I apologize for the delay. I've just finished the implementation, and you can find the docker-lint.yml file here. Please review it and let me know your feedback or suggestions.

@ajbozarth
Copy link
Member Author

I've just finished the implementation, and you can find the docker-lint.yml file here. Please review it and let me know your feedback or suggestions.

@Hawazyn I would suggest opening a PR for review

@baentsch
Copy link
Member

I would suggest that be handled upstream [...] add #284 to the "Fixes" list if everyone thinks this dresses it enough.

I'd be OK with the latter if you'd be willing to add code run at/to the upstreams to facilitate that, either by way of adding snippets to the documentation in the upstream release Wikis or by creating CI code triggered by upstream releases: OK for you @ajbozarth (may be part of the documentation ToDo?)

Remaining Open Discussion:

Is the definition of env.push correct? Comment detailing the issue: 

#321 (comment)

I'd think this would also be resolved if the corresponding trigger is done in the upstream CI at release time, no?

Signed-off-by: Alex Bozarth <[email protected]>
@ajbozarth
Copy link
Member Author

I'd be OK with the latter if you'd be willing to add code run at/to the upstreams to facilitate that

Once this PR is merged to main I can open PRs on liboqs and oqsprovider adding a small workflow to tigger the build.yml workflow.

Based on this I will need to update env.push to also not run when input.build_main is true (which it will be when called from liboqs and oqsprovider.

@ajbozarth
Copy link
Member Author

Once this PR is merged to main I can open PRs on liboqs and oqsprovider adding a small workflow to tigger the build.yml workflow.

I opened open-quantum-safe/oqs-provider#587 as a draft, but for some reason the new workflow didn't run, so I'll have to wait to debug it until this is merged. Once I get that working I will add the same workflow to liboqs

Copy link
Member

@baentsch baentsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for agreeing to add release triggers, @ajbozarth . Also very good this is now using the native arm64 runner. The only thing open then from my side is documentation: What's triggered when and how and for which demos? What's a blueprint ("bare minimum CI") for adding new demos? Where and when are (which versions of) the images pushed to?

@ajbozarth
Copy link
Member Author

The only thing open then from my side is documentation: What's triggered when and how and for which demos? What's a blueprint ("bare minimum CI") for adding new demos? Where and when are (which versions of) the images pushed to?

Where do you suggest I add this? Should it go in a new section on the README (if so where in the README) or should I create a new md file?

@ajbozarth
Copy link
Member Author

Where do you suggest I add this? Should it go in a new section on the README (if so where in the README) or should I create a new md file?

@baentsch further followup on my question: I noticed that our contributor docs are in the GitHub wiki and not a markdown file. Is there a motivation for this? The wiki can only be edited with write access and can't take PRs. I think that doc would be the best place to add the CI docs, but I can only contribute it if it's in a markdown file instead of the wiki. Would it be appropriate to move that content into a new CONTRIBUTING.md file?

Signed-off-by: Alex Bozarth <[email protected]>
@ajbozarth
Copy link
Member Author

@baentsch So I decided to go through my suggestion above about the CONTRIBUTING.md and move the wiki content to the new file and added a CI section to the bottom. This means we could get rid of the wiki page if you're ok with this change.

I also updated the README in a few places and added workflow badges to the table for each demo (I left the openssl3 one at the top since it's not in the table?).

Lastly I added a workflow template file for future use. Though I put it in the templates dir I did not add the metadata file to propagate it to the GitHub Action UI. It's just there as a starting point for new demos.

@ajbozarth
Copy link
Member Author

Update on the liboqs/oqs-provider upstream CI integration

This is much more complicated than I expected. In short there is no way to do what I wanted to, that is to add a workflow to liboqs/oqs-provider that runs the build.yml here and then shows success/failure in the CI for those repos. The method I intended to use try to run the workflow against the repo that runs it, meaning it will run the workflows against liboqs repo contents instead of oqs-demos repo.

The proper way of triggering a workflow from another repo is to make a repository_dispatch call. This is done by making a curl POST call from the liboqs workflow that will then trigger a workflow here, but won't track its results or show any helpful feedback in liboqs's CI. In addition repository_dispatch works completely different than workflow_call or workflow_dispatch and has incompatible inputs, meaning I would need to make a lot of edits to build.yml. Also repository_dispatch would require setting up a Personal Access Token for a maintainer that has write access to both repos to use in the workflow.

In Summary

Based on my research and understanding of what we want from this feature, it would actually be less work and hassle to either just run it manually after updates are made to liboqs/oqs-provider, or set up a cron job to run it (this would still require the edits to build.yml) once a week or something. @baentsch ?

@ajbozarth
Copy link
Member Author

ajbozarth commented Dec 10, 2024

Per my above comment I have pushed an update adding a cron job that will run all the workflows against liboqs and oqs-provider main once a week (Mon 1am). If we would rather use repository_dispatch instead I can revert the last commit and add that instead, but I believe this solution is cleaner and requires less work.

As for displaying the results, IIUC the badges I added to the README will only look at the latest run, so unless a new commit was push in the last week they will show the results of the weekly build against liboqs main. I believe this is an okay behavior (consistent with many other repos). This also means that we can add a badge for build.yml on liboqs and oqs-provider to display that week's results. If this is approved I will open PRs to add those badges after this is merged

Implementation Note: The explanation for the weird !contains() line can be found here: https://stackoverflow.com/a/73495922

@baentsch
Copy link
Member

Where do you suggest I add this? Should it go in a new section on the README (if so where in the README) or should I create a new md file?

@baentsch further followup on my question: I noticed that our contributor docs are in the GitHub wiki and not a markdown file. Is there a motivation for this? The wiki can only be edited with write access and can't take PRs. I think that doc would be the best place to add the CI docs, but I can only contribute it if it's in a markdown file instead of the wiki. Would it be appropriate to move that content into a new CONTRIBUTING.md file?

This is precisely the same issue/suggestion made #329 (comment). The reason for not doing it earlier was the simple lack of interest by more people to contribute. This has now changed with you and @Hawazyn being active, so the switch to a file is goodness and follows the same switch we did in liboqs and oqsprovider (for the same reasons :-). So, Thanks for doing it.

@baentsch
Copy link
Member

Based on my research and understanding of what we want from this feature, it would actually be less work and hassle to either just run it manually after updates are made to liboqs/oqs-provider, or set up a cron job to run it (this would still require the edits to build.yml) once a week or something. @baentsch ?

oqsprovider has a script to run for validating releases. It may be a good thing to add the above-mentioned "trigger-curl" to that (and add such script to liboqs too: @SWilson4 ?). Doing it by "manual-edit-then-cron" (if I understand the approach) sounds not quite right.

@ajbozarth
Copy link
Member Author

Doing it by "manual-edit-then-cron" (if I understand the approach) sounds not quite right.

@baentsch I'm not 100% sure we're on the same page here, the cron would just run once a week and try building and testing each demo against liboqs and oqs-provider main. This is more of a regular check, rather than being triggered by a push to main.

If this is still not something we want I can revert 666ab72 which contains the cron addition.

@baentsch
Copy link
Member

baentsch commented Dec 16, 2024

This is more of a regular check, rather than being triggered by a push to main.

That approach sounds good to me. I then probably misunderstood. And anyway we can always revert stuff if it doesn't do what we find good.

Any last comments before this is merged? Edit/add: Is the openvpn error in CI one worth while investigating?

@ajbozarth
Copy link
Member Author

Any last comments before this is merged?

I'd say this is good to merge.

Edit/add: Is the openvpn error in CI one worth while investigating?

I believe we will need to open a followup issue to address it, I'm not sure why the test started failing last week. It had previously passed consistently, then began to fail consistently last week. The switch had no relation to the code in the PR afaik, it started after I made a push after not touching the PR for 2 weeks due to my vacation/holidays. Since then it has consistently failed on the ubuntu-latest runner, the arm64 test started passing once I switch from using qemu to the oqs-arm64 runner.

@baentsch baentsch mentioned this pull request Dec 17, 2024
@baentsch
Copy link
Member

we will need to open a followup issue to address it

Created #330 to track. Merging this PR then: Thanks for all the effort, @ajbozarth !

@baentsch baentsch merged commit adbc13c into open-quantum-safe:main Dec 17, 2024
38 of 39 checks passed
@ajbozarth ajbozarth deleted the ci branch December 17, 2024 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
5 participants