Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to fetch diff for a large PR #249

Closed
Alexolut opened this issue Jun 13, 2024 · 11 comments · Fixed by #262
Closed

Unable to fetch diff for a large PR #249

Alexolut opened this issue Jun 13, 2024 · 11 comments · Fixed by #262

Comments

@Alexolut
Copy link

From the recent time I started to get fetch errors on PR with more than 300 files. At least couple months ago there were no problems, but now I see the following:

ERROR:CPP Linter:response returned 406 from GET https://api.github.com/repos/.../pulls/NNN with message: 
{"message":"Sorry, the diff exceeded the maximum number of files (300). 
Consider using 'List pull requests files' API or locally cloning the repository instead.","errors":
[{"resource":"PullRequest","field":"diff","code":"too_large"}],
"documentation_url":"https://docs.github.com/rest/pulls/pulls#list-pull-requests-files","status":"406"}

Is there a possibility to solve this issue on cpp-linter side?

@2bndy5
Copy link
Collaborator

2bndy5 commented Jun 13, 2024

We could try reverting back to the Github REST API endpoint that lists the files (with individual diffs) instead of trying to get the whole diff... IMHO, this isn't a good idea though. We changed to using REST API to get the full diff (in 1 HTTP request) to avoid

  1. having to re-assemble the entire diff from multiple diffs (which don't have all the diff info for each file).
  2. a file with a a lot of changes does not include diff at all when getting a list of files with individual diffs. This is the main reason we switched to getting the entire diff.

Getting the entire diff at once proved to be more reliable.

Honestly, a PR with 300 files changed is undesirable for many reasons, especially when working as part of a team. I'm willing to bet (& I'm not a gambling type of person) that parts of that large PR could have been broken up into smaller (& possibly sequential) PRs.

@2bndy5
Copy link
Collaborator

2bndy5 commented Jun 13, 2024

FYI, this problem isn't really specific to github. I had to stop working on gitlab support for the exact same reason: Gitlab's REST API does not offer a way to get the full diff and still neglects to return the diff for files with a lot of changes. However, github seems to have a bigger cap on the amount of data that the REST API can return.

@2bndy5 2bndy5 changed the title Unable to fetch large PR Unable to fetch diff for a large PR Jun 13, 2024
@2bndy5
Copy link
Collaborator

2bndy5 commented Jun 13, 2024

or locally cloning the repository instead

This idea is less feasible because the default behavior of actions/checkout provides a shallow checkout. So, git diff on a shallow checkout returns all file changes since the repo was created. My experiments with increasing the fetch-depth option yielded no good results. And the fetch depth would be at the mercy of the repo's admin(s). Not to mention, the increase in CI time to checkout a repo with a non-zero fetch-depth, and the storage capacity of CI runners may be inadequate for large repos.

@retroandchill
Copy link

Honestly, a PR with 300 files changed is undesirable for many reasons, especially when working as part of a team. I'm willing to bet (& I'm not a gambling type of person) that parts of that large PR could have been broken up into smaller (& possibly sequential) PRs.

For most projects, I would be inclined to agree, the issue I have is that my project is a game project and so sometimes the art assets can easily throw it over the limit. I wish the API had some way to filter on the type of files, so that for example, only text files are considered in the diff.

@2bndy5
Copy link
Collaborator

2bndy5 commented Jul 28, 2024

Its true, the REST API endpoints don't provide an option to filter the files on a event's diff. Such a behavior might be more suited as a specific endpoint (like what gitea provides). Maybe this could be a feature request to GitHub.

@petersteneteg
Copy link

Hi
I also ran into this issue for a large PR refactoring various dependencies. basically removing a lot of files.
I think it would be fine if the checks fails for such large diffs, But it would be nice if it failed in a nice way, not just a crash as is now.

@2bndy5
Copy link
Collaborator

2bndy5 commented Sep 3, 2024

If you're suggesting a silent failure (with zero exit code), then I'm sorry but no. If cpp-linter cannot get the diff of an event, then that is a critical failure. It doesn't only happen on a large diff, it could also mean the GitHub token used doesn't have adequate permissions.

@2bndy5
Copy link
Collaborator

2bndy5 commented Sep 3, 2024

FYI, disabling lines-changed-only, files-changed-only, and the PR review feature (tidy-review and format-review) should avoid needing the diff of the event.

@petersteneteg
Copy link

I undersand, that makes sense. Not much one can do.

@2bndy5
Copy link
Collaborator

2bndy5 commented Sep 3, 2024

Github REST API has an endpoint that lists the files in a PR (with individual diffs) and an equivalent endpoint for commits.

Warning

These endpoints take longer to consume/process and are limited to 3000 changed files.

I'm thinking about using these endpoints as a "plan B" for when this large diff problem is hit, but ultimately there is nothing we can do for ridiculously large PRs/commits. The plan B would still be subject to flaws in GitHub's REST API.

@2bndy5
Copy link
Collaborator

2bndy5 commented Sep 16, 2024

v2.13.0 will resort to using a paginated request/response for getting changed files' diff. This increases the max changed file count to 3000.

Caution

There are still some other limitations imposed by the REST API that could easily surface with large diffs.

  • thread-comments are truncated to 65535 bytes maximum, so some feedback from clang-tidy/format may not be reported.
  • file-annotations posted by 1 workflow are capped at 50 (or something similarly small)
  • PR review suggestions might also be limited to something like 25 comments (needs research/verification), but this feature is still experimental. The cpp-linter package codebase does not currently include compensation for the REST API limits about PR reviews.

In the end, it is still highly recommended to split up large changes into several incremental contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants