Operate-first prow / CI issues from `kubeval` check #2288

Gregory-Pereira · 2022-08-24T17:15:13Z

Prow has been slow recently, and a few PRs are being blocked due to their kubeval check failing see example. My theory is these performance issues are related to that kubeval check itself, in that it is not currently checking the diff, but the entire repo as you can see in that check example log. It eventually values with an exit code 137 which means it is dying due to memory constraints. Can kubeval be set to run only against the diff produced by the PR @mimotej? If not we can consider upping the resources per PR (although that could become pretty intensive), or we should consider removing the check entirely.

/cc @mimotej

The text was updated successfully, but these errors were encountered:

Gregory-Pereira · 2022-08-24T17:41:31Z

There was a similar issue with precommit a while back. Not sure if this solution is repeatable here but sharing it for reference:
https://github.com/thoth-station/prescriptions/pull/27401/files.
thoth-station/prescriptions-refresh-job#52 (comment)

Gregory-Pereira · 2022-08-24T17:56:59Z

So it turns out it is checking for a diff as seen here. However running that against my example pr produces this:

$ PULL_BASE_SHA="281d08f5c6ec49eeecf16d70ee8cde57f7d881fe" 
$ PULL_PULL_SHA="28883326290e823552a955817d37af88ef31857d"
$ bases=$(git diff --name-status $PULL_BASE_SHA $PULL_PULL_SHA | grep -Po "^[^DR]\s+\K.*?(?=/)|^R\w*\s+.*\s+\K.*?(?=/)" | sort | uniq)
$ echo $bases
cluster-scope

As you can see, it returns the entire cluster-scope directory as the resulting diff, instead of the one file: cluster-scope/base/operators.coreos.com/subscriptions/acm-operator-subscription/subscription.yaml

mimotej · 2022-08-25T06:35:46Z

For now seems that upping resources fo kubeval is enough will have to investigate further why.

/assign

mimotej · 2022-08-25T14:13:06Z

Little update:
Problem seems to be in regex which is used after because what it does it matches highest layer in directory path meaning in this case cluster-scope

If we figure out how to change this regex to match lowest possible layer in directory path it will fix it...
This issue also applies to kustomize-build test there it is only less visible because it takes less resources overall

tumido · 2022-08-25T14:51:12Z

As you can see, it returns the entire cluster-scope directory as the resulting diff, instead of the one file: cluster-scope/base/operators.coreos.com/subscriptions/acm-operator-subscription/subscription.yaml

Which is the correct outcome. Kubeval can't be run against partial manifests in base/overlays... Keep in mind that we modify and patch those manifests in overlays so for example subscription which requires a channel to be specified gains the value in overlay, etc.

The grep command is matching the root application folder intentionally because so far we don't have any heuristic to determine which base or overlay is including which modified file... And we need to run each change set through kustomize build before we can kubeval it, without it, partial manifests may not and won't fit the api spec resulting in false positives.

Gregory-Pereira · 2022-09-02T22:41:24Z

Then this seems to be logical and intended behaviour. I think this explanation coupled with your PR to exclude some pre-submit checks based on the files changed should resolve this issue. Would you agree @tumido, and if so can I close this?

sesheta · 2022-12-01T23:25:01Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

tumido · 2022-12-02T13:51:23Z

I don't think this is an issue anymore.

Gregory-Pereira changed the title ~~Operate-first prow / CI issues~~ Operate-first prow / CI issues from kubeval check Aug 24, 2022

Gregory-Pereira mentioned this issue Aug 24, 2022

doubling kubeval check memory limits and requires #2289

Merged

HumairAK added this to Operations Project Board Aug 24, 2022

HumairAK moved this to Next tasks Backlog in Operations Project Board Aug 24, 2022

HumairAK added the continous-integration label Aug 24, 2022

sesheta assigned mimotej Aug 25, 2022

sesheta added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 1, 2022

tumido closed this as completed Dec 2, 2022

Repository owner moved this from Next tasks Backlog to Done in Operations Project Board Dec 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operate-first prow / CI issues from `kubeval` check #2288

Operate-first prow / CI issues from `kubeval` check #2288

Gregory-Pereira commented Aug 24, 2022 •

edited

Loading

Gregory-Pereira commented Aug 24, 2022

Gregory-Pereira commented Aug 24, 2022 •

edited

Loading

mimotej commented Aug 25, 2022 •

edited

Loading

mimotej commented Aug 25, 2022 •

edited

Loading

tumido commented Aug 25, 2022

Gregory-Pereira commented Sep 2, 2022

sesheta commented Dec 1, 2022

tumido commented Dec 2, 2022

Operate-first prow / CI issues from kubeval check #2288

Operate-first prow / CI issues from kubeval check #2288

Comments

Gregory-Pereira commented Aug 24, 2022 • edited Loading

Gregory-Pereira commented Aug 24, 2022

Gregory-Pereira commented Aug 24, 2022 • edited Loading

mimotej commented Aug 25, 2022 • edited Loading

mimotej commented Aug 25, 2022 • edited Loading

tumido commented Aug 25, 2022

Gregory-Pereira commented Sep 2, 2022

sesheta commented Dec 1, 2022

tumido commented Dec 2, 2022

Operate-first prow / CI issues from `kubeval` check #2288

Operate-first prow / CI issues from `kubeval` check #2288

Gregory-Pereira commented Aug 24, 2022 •

edited

Loading

Gregory-Pereira commented Aug 24, 2022 •

edited

Loading

mimotej commented Aug 25, 2022 •

edited

Loading

mimotej commented Aug 25, 2022 •

edited

Loading