Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine internal vs external links at parse time #884

Merged
merged 3 commits into from
Feb 26, 2024

Conversation

Eric-Arellano
Copy link
Collaborator

Part of #876 to split the internal and external link checkers into distinct programs. To do this, it's useful to split out internal vs. external links at parse time, whereas before we did it inside FileBatch. The external link checker won't use FileBatch.

This is only a refactor and doesn't change the program, other than now using HEAD for external link requests rather than GET. We also now use a Set for anchors so that checking if an anchor is included is faster.

@@ -83,8 +83,10 @@ async function main() {

const fileBatches = await determineFileBatches(args);
const otherFiles = [
...(await globby("public/**/*")).map((fp) => new File(fp, [])),
...SYNTHETIC_FILES.map((fp) => new File(fp, [], true)),
...(await globby("public/{images,videos}/**/*")).map(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would include the objects.inv files, which was wrong.

for (const filePath of this.toCheck) {
const parsed = await parseFile(filePath);
files.push(new File(filePath, parsed.anchors));
if (!IGNORED_FILES.has(filePath)) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three ignores are moved into addLinksToMap.

internalLinks.push(new InternalLink(linkPath, originFiles));
}
addLinksToMap(filePath, parsed.internalLinks, internalLinksToOriginFiles);
if (loadExternalLinks) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't waste our time storing external links if not necessary. That should slightly decrease memory usage and improve performance.

results.push(await link.check());
}
// For loop reduces the risk of rate-limiting.
for (let link of externalLinkList) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

externalLinkList will be empty if we're not checking external links.

Copy link
Collaborator

@arnaucasau arnaucasau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Eric! Tested locally and all looks good :)

scripts/lib/links/FileBatch.ts Outdated Show resolved Hide resolved
Co-authored-by: Arnau Casau <[email protected]>
Copy link
Member

@frankharkins frankharkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

Comment on lines 141 to 148
if (
IGNORED_FILES.has(filePath) ||
ALWAYS_IGNORED_URLS.has(link) ||
FILES_TO_IGNORES[filePath]?.includes(link)
) {
return;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice

@Eric-Arellano Eric-Arellano added this pull request to the merge queue Feb 26, 2024
Merged via the queue into main with commit abde486 Feb 26, 2024
2 checks passed
@Eric-Arellano Eric-Arellano deleted the EA/split-up-file-batch branch February 26, 2024 14:39
frankharkins pushed a commit to frankharkins/documentation that referenced this pull request Jul 22, 2024
Part of Qiskit#876 to split the
internal and external link checkers into distinct programs. To do this,
it's useful to split out internal vs. external links at parse time,
whereas before we did it inside `FileBatch`. The external link checker
won't use `FileBatch`.

This is only a refactor and doesn't change the program, other than now
using `HEAD` for external link requests rather than `GET`. We also now
use a `Set` for anchors so that checking if an anchor is included is
faster.

---------

Co-authored-by: Arnau Casau <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants