-
-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 filter non-editorial content out of sitemap #2846
Conversation
baker/sitemap.ts
Outdated
where isGdocPublished = TRUE`) | ||
const alreadyPublishedViaGdocsSlugsSet = new Set( | ||
alreadyPublishedViaGdocsSlugs.map((row: any) => row.slug) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean this ?
) | |
alreadyPublishedViaGdocsSlugs[0].map((row: any) => row.slug) |
I'm not too familiar with knex raw queries but it seems we're getting an array out of it, so
alreadyPublishedViaGdocsSlugs
contains an array of rows and fields. Without this change, no slug make it into the set, and end up as duplicates in the sitemap, e.g. diet-affordability
(which on my local has been ported over (isGdocPublished = 1
))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn! Good catch! I'd just copied the code (it was on the cusp of my DRY threshold) without checking it. My local didn't have any published Gdocs successors when I diffed the sitemaps so I didn't notice it wasn't working.
I've extracted the function and fixed it! Thanks 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice reuse of the current APIs!
baker/algolia/indexToAlgolia.tsx
Outdated
) | ||
const postsApi = await wpdb.getPosts( | ||
undefined, | ||
(post) => !publishedGdocsBySlug[`/${post.slug}`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed an existing bug related to this that you might want to address here: #2864
9bf2c09
to
5a16038
Compare
Fixes #2726
Uses the
wbdb.getPosts()
method (and fixes up / simplifies some incorrect typing) which correctly filters out reusable blocks. There's a default filter for posts that end in-country-profile
in this method, but AFAICT that's to filter out the templates, not the pages themselves - those still get added:As well as default country pages:
Also uses the
Gdoc.getPublishedGdocs()
method which filters out fragments and unpublished documents.On my local environment:
sitemap.xml
length before: 6056sitemap.xml
length after: 6043The filtered pages: