-
-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bake posts, pages and blocks from WP API snapshots #3166
Conversation
Current dependencies on/for this PR:
This stack of pull requests is managed by Graphite. |
WalkthroughThe overarching change involves modernizing and decoupling the system from WordPress dependencies by introducing new functions for post retrieval and processing, deprecating old WordPress querying functions, and refining data types to accommodate these updates. This transition aims to streamline data handling, enhance modularity, and future-proof the platform by moving towards a more structured and efficient architecture for managing content and its metadata. Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
bd5e309
to
8d1e7fb
Compare
d7a3e3b
to
f8924c9
Compare
d808ba9
to
74cd8dd
Compare
d730a41
to
efefec8
Compare
0ef0e77
to
da807c5
Compare
efefec8
to
e7c0485
Compare
cb1d456
to
8265f67
Compare
@coderabbitai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (15)
- adminSiteServer/apiRouter.ts (2 hunks)
- baker/GrapherBaker.tsx (3 hunks)
- baker/SiteBaker.tsx (5 hunks)
- baker/algolia/indexToAlgolia.tsx (3 hunks)
- baker/pageOverrides.tsx (2 hunks)
- baker/siteRenderers.tsx (9 hunks)
- baker/sitemap.ts (2 hunks)
- db/DEPRECATEDwpdb.ts (1 hunks)
- db/migrateWpPostsToArchieMl.ts (2 hunks)
- db/model/Post.ts (3 hunks)
- db/syncPostsToGrapher.ts (1 hunks)
- db/wpdb.ts (13 hunks)
- packages/@ourworldindata/types/src/dbTypes/Posts.ts (3 hunks)
- packages/@ourworldindata/types/src/index.ts (1 hunks)
- packages/@ourworldindata/types/src/wordpressTypes/WordpressTypes.ts (1 hunks)
Files not reviewed due to errors (1)
- (no review received)
Additional comments: 37
packages/@ourworldindata/types/src/wordpressTypes/WordpressTypes.ts (1)
- 54-60: LGTM!
baker/pageOverrides.tsx (2)
- 7-10: LGTM!
- 67-67: LGTM!
packages/@ourworldindata/types/src/dbTypes/Posts.ts (2)
- 47-47: LGTM!
- 99-111: LGTM!
baker/sitemap.ts (1)
- 65-65: LGTM!
baker/algolia/indexToAlgolia.tsx (3)
- 29-29: LGTM!
- 96-96: LGTM!
- 197-197: LGTM!
db/DEPRECATEDwpdb.ts (1)
- 1-8: LGTM!
db/model/Post.ts (5)
- 88-93: LGTM!
- 111-122: LGTM!
- 137-151: LGTM!
- 153-183: LGTM!
- 278-289: LGTM!
db/migrateWpPostsToArchieMl.ts (1)
- 22-22: LGTM!
packages/@ourworldindata/types/src/index.ts (1)
- 497-498: Ensure the newly exported functions
snapshotIsPostRestApi
andsnapshotIsBlockGraphQlApi
are correctly implemented and relevant to the consumers of this module.db/syncPostsToGrapher.ts (1)
- 353-354: Switching to
wpdb.getBlockApiFromApi
andwpdb.getPostApiBySlugFromApi
for fetching post data based on type is a significant change. Verify that these methods provide improved functionality or performance compared to their predecessors.db/wpdb.ts (4)
- 107-109: Exporting
WP_API_ENDPOINT
,OWID_API_ENDPOINT
, andWP_GRAPHQL_ENDPOINT
constants improves modularity and reusability. Ensure these endpoints are correctly defined and used consistently across the project.- 117-123: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [120-145]
The
graphqlQuery
function is now exported, allowing for broader use. Verify that error handling and response parsing are adequately addressed to ensure robust interaction with the WordPress GraphQL API.
- 285-287: The introduction of
getPostApiBySlugFromApi
as an exported function is a key change. Ensure that this function correctly handles errors and returns the expected data format from the WordPress REST API.- 558-573: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [528-572]
Exporting
getBlockApiFromApi
and refactoring thegetTables
function to improve data fetching from the WordPress API are significant changes. Confirm that these functions are implemented with proper error handling and efficient data processing.baker/GrapherBaker.tsx (2)
- 46-49: Ensure
getPostRelatedCharts
is correctly implemented and used as intended to replacegetRelatedCharts
. Verify its definition and ensure it aligns with the expected functionality.- 367-367: Confirm that the replacement of
getRelatedCharts
withgetPostRelatedCharts
correctly fetches related charts based on post ID and integrates seamlessly with the existing logic.baker/siteRenderers.tsx (9)
- 83-90: Verify the correct implementation and usage of snapshot-related functions (
getBlockContentFromSnapshot
,getBlogIndex
,getFullPostByIdFromSnapshot
,getFullPostBySlugFromSnapshot
,isPostSlugCitable
,postsTable
). Ensure they properly fetch and handle post data from snapshots.- 197-197: Confirm that
getFullPostBySlugFromSnapshot
correctly retrieves full post data by slug from snapshots and integrates well with the rendering logic.- 202-202: Ensure
getFullPostByIdFromSnapshot
is used correctly to fetch full post data by ID from snapshots for preview rendering.- 236-237: Check that
isPostSlugCitable
correctly determines the citability of a post slug and integrates properly with page overrides logic.- 489-489: Verify that
getFullPostBySlugFromSnapshot
is correctly used to fetch generic country profile post data from snapshots.- 509-509: Confirm that
getFullPostBySlugFromSnapshot
correctly fetches the country profile landing post data from snapshots.- 568-568: Ensure that
getFullPostBySlugFromSnapshot
is correctly used to fetch post data by slug for rendering post thumbnails, and handle cases where the post might not exist.- 608-612: Check the usage of
getFullPostBySlugFromSnapshot
withinrenderProminentLinks
to ensure it correctly fetches post titles for internal URLs and handles errors appropriately.- 722-722: Verify the correct usage of
getBlockContentFromSnapshot
for rendering reusable blocks within the explorer page context.baker/SiteBaker.tsx (4)
- 74-80: Replace direct imports from
../db/wpdb.js
with modular functions from../db/model/Post.js
forgetBlogIndex
,getFullPost
,getPostsFromSnapshots
,postsFlushCache
, andpostsTable
. This change enhances modularity and aligns with the PR's objective to transition from dynamic WordPress API calls to using static snapshots.- 391-404: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [394-427]
The
removeDeletedPosts
method now usesgetPostsFromSnapshots
andgetFullPost
to fetch posts and their details. Ensure that the logic for identifying and removing deleted posts is correctly implemented and that it efficiently handles the removal process without impacting performance.
- 427-435: The
bakePosts
method now usesgetPostsFromSnapshots
with a filter condition to exclude posts already published via Gdocs. Verify that the filter condition is correctly implemented and that the method efficiently processes posts for baking.- 919-919: The
flushCache
method now usespostsFlushCache
to clear post-related caches. This change aligns with the PR's objective to move towards a more modular approach and ensures efficient garbage collection.
e7c0485
to
6309b46
Compare
8265f67
to
c2e9291
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By and large this looks good, but I have two change requests:
- the first one is to add a code path that fetches full posts in bulk (with one SQL query instead 1+N) and use that in most places where full posts are used
- the second is to pass in the knex instance explicitly into all functions (i.e. not use db.knexTable or similar constructs). This is to make sure that transaction scopes work correctly when they are needed.
Grab me in slack if you want to discuss any of this or want me to help in rewriting the query!
db/model/Post.ts
Outdated
export const isPostSlugCitable = async (slug: string): Promise<boolean> => { | ||
const entries = SiteNavigationStatic.categories | ||
return entries.some((category) => { | ||
return ( | ||
category.entries.some((entry) => entry.slug === slug) || | ||
(category.subcategories ?? []).some( | ||
(subcategory: CategoryWithEntries) => { | ||
return subcategory.entries.some( | ||
(subCategoryEntry) => subCategoryEntry.slug === slug | ||
) | ||
} | ||
) | ||
) | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that in the place where we define SiteNavigationStatic we create a Set with all the leaves and then this one becomes a simple lookup in the set. The function should also not return a promise in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
db/model/Post.ts
Outdated
SELECT * FROM ${postsTable} | ||
WHERE status = "publish" | ||
AND type IN (?) | ||
ORDER BY JSON_UNQUOTE(JSON_EXTRACT(wpApiSnapshot, '$.date')) DESC; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not wpApiSnapshot ->> '$.date'
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
who knows? 😅 Thanks!
db/model/Post.ts
Outdated
const rawPosts: DbRawPost[] = ( | ||
await db.knexInstance().raw( | ||
` | ||
SELECT * FROM ${postsTable} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't select *
here if we only use wpApiSnapshot below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it's a bit heavy handed to parse the whole post just to use the wpApiSnapshot, but it allows us to keep a single pipeline to transform raw DB posts (DbRawPost) into parsed posts (DbEnrichedPost), through parsePostRaw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know but the idea is for the parse*Raw functions to be explicitly modular - i.e. if you only need need wpApiSnapshot then there should be a parsePostWpApiSnapshot function next to parsePostRaw that parses just this field. A post row member is now 100K on average without spApiSnapshot and it just seems a bit on the extreme side to read all that if you won't even use it, especially as this is going to be used in a batch context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense, thanks. Fixed in the latest push.
imageId: postApi.featured_media, | ||
relatedCharts: | ||
postApi.type === "page" | ||
? await getPostRelatedCharts(postApi.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not a good query pattern - it looks like quite a few code paths need the result of getFullPost for many posts, but by doing it this way we have to run one query against MySQL for every post. I think it would be much better to have two flavours of getPostsFromSnapshots, one like it is at the moment, and one that also queries for the related chart information (doing a join and then a group by). This second query would mean that getting full posts for all posts can be done with a single query against the DB. Admittedly, this one query would then become quite complex, but I think this is still worth it. I'm happy to help with writing the refined query, just let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f15615f
to
e93d1ba
Compare
I reset the whole staging server and ran a full bake: everything looks good, so I'm going ahead with the deploy. |
This PR replaces the fortunejs content graph with a DB-based alternative, using the links stored in the `posts_links` and `posts_gdoc_links` tables. ![Screenshot 2024-02-07 at 10.54.47.png](https://graphite-user-uploaded-assets-prod.s3.amazonaws.com/0SFFiIjKuUK6UPYHVe6u/8f7371fb-0e80-4d07-8a4e-4fc4c8002c59.png) This PR also continues the deprecation work started in #3166. - [x] fully rebake site on staging - [x] fixed malformed prominent link in content (ourworldindata.org/ instead of https://ourworldindata.org/) - [x] https://owid.cloud/wp/wp-admin/post.php?post=34103&action=edit - [x] https://owid.cloud/wp/wp-admin/post.php?post=34066&action=edit **Inconsistency on how grapher chart links are stored in the DB**: there are a handful of chart links stored with the https://ourworldindata.org/grapher prefix. These disappear upon saving the containing articles, which indicates that they were created with an older version of the codebase. I'm then opting for not supporting them in the content graph. ```sql SELECT pgl.*, pg.published from posts_gdocs_links pgl JOIN posts_gdocs pg on pg.id = pgl.sourceId WHERE pgl.target LIKE "https://ourworldindata.org/grapher%" ``` ### Testing links Below are some testing links to grapher pages, backlinking to posts in different configurations. - gdoc: https://ourworldindata.org/grapher/agricultural-export-subsidies - [x] http://localhost:3030/grapher/agricultural-export-subsidies - [x] http://staging-site-db-content-graph/grapher/agricultural-export-subsidies - gdocs: https://ourworldindata.org/grapher/pollution-deaths-from-fossil-fuels - [x] http://localhost:3030/grapher/pollution-deaths-from-fossil-fuels - [x] http://staging-site-db-content-graph/grapher/pollution-deaths-from-fossil-fuels - wp: https://ourworldindata.org/grapher/dalys-rate-from-all-causes - [x] http://localhost:3030/grapher/dalys-rate-from-all-causes - [x] http://staging-site-db-content-graph/grapher/dalys-rate-from-all-causes - wp (with chart redirect): https://ourworldindata.org/grapher/age-standardized-death-rate-from-pm25-pollution-per-100000-vs-gdp-per-capita-int- - [x] http://localhost:3030/grapher/age-standardized-death-rate-from-pm25-pollution-per-100000-vs-gdp-per-capita-int- - [x] http://staging-site-db-content-graph/grapher/age-standardized-death-rate-from-pm25-pollution-per-100000-vs-gdp-per-capita-int- - gdoc (with chart redirect): https://ourworldindata.org/grapher/population-long-run-with-projections?time=earliest..2100&country=~OWID_WRL - [x] http://localhost:3030/grapher/population-long-run-with-projections?time=earliest..2100&country=~OWID_WRL - [x] http://staging-site-db-content-graph/grapher/population-long-run-with-projections?time=earliest..2100&country=~OWID_WRL - none: https://ourworldindata.org/grapher/death-rates-alcohol-drug-overdoses-by-age-who - [x] http://localhost:3030/grapher/death-rates-alcohol-drug-overdoses-by-age-who - [x] http://staging-site-db-content-graph/grapher/death-rates-alcohol-drug-overdoses-by-age-who
- pass knex as a parameter to downstream functions - wrap top API calls in a transaction - replace knex.raw() with knexRaw() - this PR doesn't add [{readOnly: true}](https://www.notion.so/owid/Database-access-in-Grapher-d9db343c2bfb4ae0b14b3dec72f686c6?pvs=4#fdb6c225d4134e47a78fe3436a19f72c). @danyx23 [as you know](#3145), these refactors tend to start with a small surface area but end up requiring extensive surgery across the codebase. This is partly why I hadn't dived into it as part of this project, and continued using the `db.knexInstance().raw()` paradigm in downstream functions. (Another reason had to do with the value of doing this refactor on partly obsolete code). So in case this ends up needing more work and blocks the delivery of the core wordpress sunsetting project, I'll revert it to draft for later. How does that sound? Context: #3166 (comment)
This PR bakes all remaining Wordpress posts, pages and blocks from API snapshots saved by the previous PR in the stack.
The full API response is saved in the database, from which we can extract the content of the post, page or block. We are currently evaluating a git-based workflow to edit, override and keep track of the content part of the snapshot, but this will be tackled separately.
This PR also deprecates the old Wordpress API functions, and moves them to a separate file for clarity.
It is important to note that these snapshots are not replacing or superseding the
post.content
column in the database. What was previously in thepost.content
column is still there, and is still the source of truth for the code paths that used it (e.g. the WP HTML source -> ArchieML migration). The snapshots are only used as a static drop-in replacement for dynamic Wordpress API calls, which were happening during the rendering of posts, pages and blocks.Depending on the remaining scope of the overall migration effort, we might decide to set up programmatic guardrails to address this source of truth issue, or simply document the caveats of the approach for the migration team.
Testing
syncPostsToGrapher
Summary by CodeRabbit
DbEnrichedPost
to include a new property for API snapshots and added related utility functions.