Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out link checker rate limiting in CI #823

Closed
Eric-Arellano opened this issue Feb 15, 2024 · 3 comments
Closed

Figure out link checker rate limiting in CI #823

Eric-Arellano opened this issue Feb 15, 2024 · 3 comments

Comments

@Eric-Arellano
Copy link
Collaborator

Eric-Arellano commented Feb 15, 2024

For example, we got rate limited by GitHub on #822.

I'm wondering if we can have some type of persistent cache across PR runs. Don't check links within N days. See internal PR 1082 for how this was done in closed source.

--

Outside of an externally-managed cache, we should definitely at least have an in-memory cache. Note that we don't store what external links have been checked between FileBatches, so we check the same URL multiple times if e.g. Qiskit 0.44 and 0.45 have the same link.

This internal cache won't completely solve the problem: we may still make too many requests to the same host but with different pages, like GitHub source code links. But it helps with certain websites.

This was implemented by #916. We also now only check links once a week in a cron job, so this issue is less urgent.

@frankharkins
Copy link
Member

I think you could have an action run a slow, extended check on main every month or so, then PRs could use that cache, at least according to this SO answer.

github-merge-queue bot pushed a commit that referenced this issue Feb 28, 2024
Closes #886. As explained
there, it's safe to only run this check on changed files because whether
a page renders is independent from all the other pages.

We use a weekly cron job, though, to check all pages. That reduces our
risk from things like the docs preview Docker image changing on us and
resulting in some pages now failing to render.

--

This PR also stops checking external links in API docs builds and
instead moves the check to the new weekly cron job. Relates to
#876.

Per #372, it's annoying to
check external links in CI because it can be flaky
(#823).
@Eric-Arellano
Copy link
Collaborator Author

Eric-Arellano commented Apr 17, 2024

One technique is to set the auth for GitHub. Update: done in #1392

github-merge-queue bot pushed a commit that referenced this issue May 15, 2024
)

Part of #823

This PR adds a new HTTP header to the external link checker to set the
auth for GitHub when we are checking GitHub links. The new header is
only used for GitHub links following what we do in the closed-source
repo.

In order to be able to get the environment variable from typescript with
`process.env.GITHUB_TOKEN`, we need to define it in the workflows. In
the case of running the script locally, the token could be undefined.

---------

Co-authored-by: Eric Arellano <[email protected]>
frankharkins pushed a commit to frankharkins/documentation that referenced this issue Jul 22, 2024
Closes Qiskit#886. As explained
there, it's safe to only run this check on changed files because whether
a page renders is independent from all the other pages.

We use a weekly cron job, though, to check all pages. That reduces our
risk from things like the docs preview Docker image changing on us and
resulting in some pages now failing to render.

--

This PR also stops checking external links in API docs builds and
instead moves the check to the new weekly cron job. Relates to
Qiskit#876.

Per Qiskit#372, it's annoying to
check external links in CI because it can be flaky
(Qiskit#823).
frankharkins pushed a commit to frankharkins/documentation that referenced this issue Jul 22, 2024
…skit#1392)

Part of Qiskit#823

This PR adds a new HTTP header to the external link checker to set the
auth for GitHub when we are checking GitHub links. The new header is
only used for GitHub links following what we do in the closed-source
repo.

In order to be able to get the environment variable from typescript with
`process.env.GITHUB_TOKEN`, we need to define it in the workflows. In
the case of running the script locally, the token could be undefined.

---------

Co-authored-by: Eric Arellano <[email protected]>
@Eric-Arellano
Copy link
Collaborator Author

This hasn't happened in a long time, in large part thanks to using the GitHub token #1392.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

3 participants