Docker based CI #80

keceli · 2022-11-15T18:05:54Z

This PR adds a Dockerfile and introduces Docker based CI as mentioned in the group meetings and earlier in the issue related to reducing build times (https://github.com/NWChemEx-Project/.github/issues/24, see the suggested solution 4).

The Dockerfile is generic and can be used in all the other repos. Along with CI, it can help users to try ParallelZone in an isolated environment and also with Codespaces.

The CI takes less than 3 minutes if a reconfigure is not triggered and updates the base image if the build and tests run successfully. The build and test could be separated, too.

There might be two pitfalls with this approach.

Dependency updates may not be caught unless a reconfigure is triggered. A workaround could be running touch CMakeLists.txt to force a reconfigure. However, I'd argue that we should always follow a specific tag/commit for our dependencies, then we don't need any workarounds. It is annoying to find out a build fails because of an update in a dependency.
GitHub might start charging for using their registry for private repos. I think it is free now for promotion and hopefully stays free until we release our repos.

There might be others as well, I just wanted to submit this PR to discuss if this idea is viable. If so, this can be generalized with the help of the CI team and be a part of .github repo.

ryanmrichard · 2022-11-15T20:22:59Z

I'd be curious as to what others think, but to me you're trying to do two different things. First, you're trying to speed-up the CI. Second, you're trying to add a deployment. IMHO, the two have different needs and should be treated separately.

To speed-up CI one needs to be able to cache all of the different configurations that CI will ultimately create. AFAIK, that mandates a container per configuration. The CI containers are also built off a PR and not a release (note that CI steps occur between accepting a PR and releasing it). While it could be separated out from the container build, it's also worth mentioning that typically one does a lot of extra stuff during CI (e.g. testing, coverage, etc.) that one doesn't want in the deployment images.

For deployment one typically only builds a couple slimmed down images. The images are configured for performance and the most common use cases. The deployed images are typically built off of releases.

As for this PR, AFAIK it's at odds with the cacheing approach @pdhung3012 is taking. Maybe I'm missing it, but it's not clear to me how the two approaches can/should co-exist. That said I think this PR can be useful eventually, but I feel like we need to get the CI to a better place first.

keceli · 2022-11-15T23:34:46Z

I'd be curious as to what others think, but to me you're trying to do two different things. First, you're trying to speed-up the CI. Second, you're trying to add a deployment. IMHO, the two have different needs and should be treated separately.

The goal is the first one, to speed-up CI. Deployment is required so that the image stays up-to-date and build is more efficient in a subsequent CI run. Publishing this image for the users is optional. We may prefer a slimmer image as you suggested or publish images publicly only when we have a release. By default any image uploaded to the registry is private.

To speed-up CI one needs to be able to cache all of the different configurations that CI will ultimately create. AFAIK, that mandates a container per configuration.

Docker images are composed of layers and all these layers are cached. I am not sure why you think you need one container per configuration. One could build a fat image with Debug and Release or with gcc and clang builds. However, even when one needs different images, these images would probably share many common layers.

As for this PR, AFAIK it's at odds with the cacheing approach @pdhung3012 is taking. Maybe I'm missing it, but it's not clear to me how the two approaches can/should co-exist. That said I think this PR can be useful eventually, but I feel like we need to get the CI to a better place first.

Well, this is a different approach. One problem I had with caching is the limits put by GitHub. Here it says that "GitHub will remove any cache entries that have not been accessed in over 7 days". Maybe there are workarounds, (i.e. one can schedule builds every week etc) but it is annoying. I also didn't like debugging action scripts since you can't test locally. Along with the promise of being faster, Docker based CI is easier to manage since the CI logic is buried in the Dockerfile and the action script is much simpler. To debug a problem you can build and run the container locally as long as you have docker installed. That said, I am still suspicious if there is any other drawbacks, so I suggest battle-testing it for ParallelZone and NWChemEx. Maybe we can use it to test an image based on Intel OneAPI, which I think is not in the radar for the current caching approach.

ryanmrichard · 2022-11-16T14:54:13Z

I appreciate your interest in helping the CI team; however, @hjjvandam, @quazirafi, @pdhung3012 and myself are currently pursing an organization-wide, action-based, CI solution. We have been working on this solution for many months, have iterated on the design several times, and worked through many problems. So while we could switch to a largely container-based solution, I don't see a way to do this without scrapping most of what we've done and starting over.

If you really want to help speed up the CI, I encourage you to reach out to @pdhung3012 and see what you can do to help within the confines of the current design. Otherwise I will point out that build time is really only an issue for CI. For development, you should be installing the dependencies which take the longest to build, libint and TiledArray (the latter primarily because of it's underlying dependencies). Using installed versions of the dependencies, you ought to be able to rebuild the entire stack in like 10 minutes on very modest hardware. Generally speaking, development is usually limited to one repo, so after the initial build, recompilation of changes should be really quick (like under a minute).

If you are worried about integration testing (making sure all the repos work nicely together) I encourage you to embrace the plugin feature of NWChemEx. In other words, compile NWChemEx using a local copy of whatever repo you're adding the module to, say SCF. Add your module to SCF. Add your integeration test to the NWChemEx repo. Rebuild NWChemEx (should be quick). Run the test. Rinse and repeat until your module is ready to go. Make a PR to SCF with your module (and unit test) and a PR to NWChemEx with your integration test. N.B. compilation is only needed for C++ modules; for Python modules, if everything's set up right, you should just have to rerun the (Python) integration test in the NWChemEx repo.

CLAassistant · 2023-04-10T20:03:52Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ keceli
❌ license[bot]

license[bot] seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

ryanmrichard · 2023-04-19T16:53:00Z

This will be superseded by efforts to resolve #107 and #109.

keceli and others added 3 commits November 15, 2022 10:47

Add a Dockerfile and a docker based CI

91fc5c0

Needs a pull to checkout the new branch

8e6761b

Committing license headers

50c77a4

ryanmrichard closed this Apr 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker based CI #80

Docker based CI #80

keceli commented Nov 15, 2022

ryanmrichard commented Nov 15, 2022

keceli commented Nov 15, 2022

ryanmrichard commented Nov 16, 2022

CLAassistant commented Apr 10, 2023

ryanmrichard commented Apr 19, 2023

Docker based CI #80

Docker based CI #80

Conversation

keceli commented Nov 15, 2022

ryanmrichard commented Nov 15, 2022

keceli commented Nov 15, 2022

ryanmrichard commented Nov 16, 2022

CLAassistant commented Apr 10, 2023

ryanmrichard commented Apr 19, 2023