Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operate-first prow / CI issues from kubeval check #2288

Closed
Gregory-Pereira opened this issue Aug 24, 2022 · 8 comments
Closed

Operate-first prow / CI issues from kubeval check #2288

Gregory-Pereira opened this issue Aug 24, 2022 · 8 comments
Assignees
Labels
continous-integration lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@Gregory-Pereira
Copy link
Member

Gregory-Pereira commented Aug 24, 2022

Prow has been slow recently, and a few PRs are being blocked due to their kubeval check failing see example. My theory is these performance issues are related to that kubeval check itself, in that it is not currently checking the diff, but the entire repo as you can see in that check example log. It eventually values with an exit code 137 which means it is dying due to memory constraints. Can kubeval be set to run only against the diff produced by the PR @mimotej? If not we can consider upping the resources per PR (although that could become pretty intensive), or we should consider removing the check entirely.

/cc @mimotej

@Gregory-Pereira Gregory-Pereira changed the title Operate-first prow / CI issues Operate-first prow / CI issues from kubeval check Aug 24, 2022
@Gregory-Pereira
Copy link
Member Author

There was a similar issue with precommit a while back. Not sure if this solution is repeatable here but sharing it for reference:
https://github.com/thoth-station/prescriptions/pull/27401/files.
thoth-station/prescriptions-refresh-job#52 (comment)

@Gregory-Pereira
Copy link
Member Author

Gregory-Pereira commented Aug 24, 2022

So it turns out it is checking for a diff as seen here. However running that against my example pr produces this:

$ PULL_BASE_SHA="281d08f5c6ec49eeecf16d70ee8cde57f7d881fe" 
$ PULL_PULL_SHA="28883326290e823552a955817d37af88ef31857d"
$ bases=$(git diff --name-status $PULL_BASE_SHA $PULL_PULL_SHA | grep -Po "^[^DR]\s+\K.*?(?=/)|^R\w*\s+.*\s+\K.*?(?=/)" | sort | uniq)
$ echo $bases
cluster-scope

As you can see, it returns the entire cluster-scope directory as the resulting diff, instead of the one file: cluster-scope/base/operators.coreos.com/subscriptions/acm-operator-subscription/subscription.yaml

@mimotej
Copy link
Member

mimotej commented Aug 25, 2022

For now seems that upping resources fo kubeval is enough will have to investigate further why.

/assign

@mimotej
Copy link
Member

mimotej commented Aug 25, 2022

Little update:
Problem seems to be in regex which is used after because what it does it matches highest layer in directory path meaning in this case cluster-scope
Screenshot from 2022-08-25 16-06-25

If we figure out how to change this regex to match lowest possible layer in directory path it will fix it...
This issue also applies to kustomize-build test there it is only less visible because it takes less resources overall

@tumido
Copy link
Member

tumido commented Aug 25, 2022

As you can see, it returns the entire cluster-scope directory as the resulting diff, instead of the one file: cluster-scope/base/operators.coreos.com/subscriptions/acm-operator-subscription/subscription.yaml

Which is the correct outcome. Kubeval can't be run against partial manifests in base/overlays... Keep in mind that we modify and patch those manifests in overlays so for example subscription which requires a channel to be specified gains the value in overlay, etc.

The grep command is matching the root application folder intentionally because so far we don't have any heuristic to determine which base or overlay is including which modified file... And we need to run each change set through kustomize build before we can kubeval it, without it, partial manifests may not and won't fit the api spec resulting in false positives.

@Gregory-Pereira
Copy link
Member Author

Then this seems to be logical and intended behaviour. I think this explanation coupled with your PR to exclude some pre-submit checks based on the files changed should resolve this issue. Would you agree @tumido, and if so can I close this?

@sesheta
Copy link
Member

sesheta commented Dec 1, 2022

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@sesheta sesheta added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 1, 2022
@tumido
Copy link
Member

tumido commented Dec 2, 2022

I don't think this is an issue anymore.

@tumido tumido closed this as completed Dec 2, 2022
Repository owner moved this from Next tasks Backlog to Done in Operations Project Board Dec 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
continous-integration lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
Development

No branches or pull requests

5 participants