Skip to content

Commit

Permalink
Bake posts, pages and blocks from WP API snapshots (#3166)
Browse files Browse the repository at this point in the history
This PR bakes all remaining Wordpress posts, pages and blocks from API snapshots saved by the previous PR in the stack.

The full API response is saved in the database, from which we can extract the content of the post, page or block. We are currently [evaluating a git-based workflow to edit, override and keep track of the content part of the snapshot](https://www.notion.so/owid/2024-02-02-Baking-and-editing-Wordpress-API-snapshots-5c92d90280774d209b7aca22f65f71dd?pvs=4#603bd407f20043bc93d4c8aaf07127ce), but this will be tackled separately.

This PR also deprecates the old Wordpress API functions, and moves them to a separate file for clarity.

It is important to note that these snapshots are not replacing or superseding the `post.content` column in the database. What was previously in the `post.content` column is still there, and is still the source of truth for the code paths that used it (e.g. the WP HTML source -> ArchieML migration). The snapshots are only used as a static drop-in replacement for dynamic Wordpress API calls, which were happening during the rendering of posts, pages and blocks.

Depending on the remaining scope of the overall migration effort, we might decide to set up programmatic guardrails to address this source of truth issue, or simply document the caveats of the approach for the migration team.

## Testing
- [x] run `syncPostsToGrapher`
- [x] rebake full site on http://staging-site-bake-from-snapshot, e.g. http://staging-site-bake-from-snapshot/personal-relations-econ-outcomes

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

## Summary by CodeRabbit

- **New Features**
	- Introduced new post retrieval and processing functions across various components for enhanced performance and maintainability.
- **Refactor**
	- Replaced deprecated WordPress database operation functions with new modular functions in the API, baker, and database layers.
	- Updated post retrieval methods to use snapshots and modular functions for better efficiency and clarity.
	- Deprecated several functions related to querying WordPress tables or APIs in favor of new, optimized methods.
- **Documentation**
	- Updated interface `DbEnrichedPost` to include a new property for API snapshots and added related utility functions.
- **Bug Fixes**
	- Adjusted function calls related to post-related charts and post citability checks to ensure accuracy and reliability.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
  • Loading branch information
mlbrgl authored Feb 19, 2024
2 parents 33b4ec0 + e93d1ba commit e7eef1f
Show file tree
Hide file tree
Showing 15 changed files with 581 additions and 396 deletions.
3 changes: 2 additions & 1 deletion adminSiteServer/apiRouter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import * as db from "../db/db.js"
import { imageStore } from "../db/model/Image.js"
import { GdocXImage } from "../db/model/GdocXImage.js"
import * as wpdb from "../db/wpdb.js"
import { DEPRECATEDgetTopics } from "../db/DEPRECATEDwpdb.js"
import {
UNCATEGORIZED_TAG_ID,
BAKE_ON_CHANGE,
Expand Down Expand Up @@ -567,7 +568,7 @@ apiRouter.get(
)

apiRouter.get("/topics.json", async (req: Request, res: Response) => ({
topics: await wpdb.getTopics(),
topics: await DEPRECATEDgetTopics(),
}))
apiRouter.get(
"/editorData/variables.json",
Expand Down
8 changes: 5 additions & 3 deletions baker/GrapherBaker.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ import {
} from "@ourworldindata/utils"
import {
getRelatedArticles,
getRelatedCharts,
getRelatedChartsForVariable,
getRelatedResearchAndWritingForVariable,
isWordpressAPIEnabled,
Expand All @@ -33,7 +32,10 @@ import {
import * as db from "../db/db.js"
import { glob } from "glob"
import { isPathRedirectedToExplorer } from "../explorerAdminServer/ExplorerRedirects.js"
import { getPostEnrichedBySlug } from "../db/model/Post.js"
import {
getPostEnrichedBySlug,
getPostRelatedCharts,
} from "../db/model/Post.js"
import {
JsonError,
GrapherInterface,
Expand Down Expand Up @@ -319,7 +321,7 @@ const renderGrapherPage = async (grapher: GrapherInterface) => {
const post = postSlug ? await getPostEnrichedBySlug(postSlug) : undefined
const relatedCharts =
post && isWordpressDBEnabled
? await getRelatedCharts(post.id)
? await getPostRelatedCharts(post.id)
: undefined
const relatedArticles =
grapher.id && isWordpressAPIEnabled
Expand Down
23 changes: 16 additions & 7 deletions baker/SiteBaker.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,13 @@ import {
bakeAllPublishedExplorers,
} from "./ExplorerBaker.js"
import { ExplorerAdminServer } from "../explorerAdminServer/ExplorerAdminServer.js"
import { postsTable } from "../db/model/Post.js"
import {
getBlogIndex,
getFullPost,
getPostsFromSnapshots,
postsFlushCache,
postsTable,
} from "../db/model/Post.js"
import { GdocPost } from "../db/model/Gdoc/GdocPost.js"
import { Image } from "../db/model/Image.js"
import { generateEmbedSnippet } from "../site/viteUtils.js"
Expand Down Expand Up @@ -424,11 +430,14 @@ export class SiteBaker {

private async removeDeletedPosts() {
if (!this.bakeSteps.has("removeDeletedPosts")) return
const postsApi = await wpdb.getPosts()

await db.getConnection()

const postsApi = await getPostsFromSnapshots()

const postSlugs = []
for (const postApi of postsApi) {
const post = await wpdb.getFullPost(postApi)
const post = await getFullPost(postApi)
postSlugs.push(post.slug)
}

Expand All @@ -454,15 +463,15 @@ export class SiteBaker {
const alreadyPublishedViaGdocsSlugsSet =
await db.getSlugsWithPublishedGdocsSuccessors(db.knexInstance())

const postsApi = await wpdb.getPosts(
const postsApi = await getPostsFromSnapshots(
undefined,
(postrow) => !alreadyPublishedViaGdocsSlugsSet.has(postrow.slug)
)

await pMap(
postsApi,
async (postApi) =>
wpdb.getFullPost(postApi).then((post) => this.bakePost(post)),
getFullPost(postApi).then((post) => this.bakePost(post)),
{ concurrency: 10 }
)

Expand Down Expand Up @@ -783,7 +792,7 @@ export class SiteBaker {
// Bake the blog index
private async bakeBlogIndex() {
if (!this.bakeSteps.has("blogIndex")) return
const allPosts = await wpdb.getBlogIndex()
const allPosts = await getBlogIndex()
const numPages = Math.ceil(allPosts.length / BLOG_POSTS_PER_PAGE)

for (let i = 1; i <= numPages; i++) {
Expand Down Expand Up @@ -962,7 +971,7 @@ export class SiteBaker {

private flushCache() {
// Clear caches to allow garbage collection while waiting for next run
wpdb.flushCache()
postsFlushCache()
siteBakingFlushCache()
redirectsFlushCache()
this.progressBar.tick({ name: "✅ cache flushed" })
Expand Down
5 changes: 3 additions & 2 deletions baker/algolia/indexToAlgolia.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ import { Pageview } from "../../db/model/Pageview.js"
import { GdocPost } from "../../db/model/Gdoc/GdocPost.js"
import { ArticleBlocks } from "../../site/gdocs/components/ArticleBlocks.js"
import React from "react"
import { getFullPost, getPostsFromSnapshots } from "../../db/model/Post.js"

interface TypeAndImportance {
type: PageType
Expand Down Expand Up @@ -92,7 +93,7 @@ async function generateWordpressRecords(
const records: PageRecord[] = []

for (const postApi of postsApi) {
const rawPost = await wpdb.getFullPost(postApi)
const rawPost = await getFullPost(postApi)
if (isEmpty(rawPost.content)) {
// we have some posts that are only placeholders (e.g. for a redirect); don't index these
console.log(
Expand Down Expand Up @@ -193,7 +194,7 @@ const getPagesRecords = async () => {
// TODO: the knex instance should be handed down as a parameter
const slugsWithPublishedGdocsSuccessors =
await db.getSlugsWithPublishedGdocsSuccessors(db.knexInstance())
const postsApi = await wpdb.getPosts(undefined, (post) => {
const postsApi = await getPostsFromSnapshots(undefined, (post) => {
// Two things can happen here:
// 1. There's a published Gdoc with the same slug
// 2. This post has a Gdoc successor (which might have a different slug)
Expand Down
9 changes: 6 additions & 3 deletions baker/pageOverrides.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,17 @@ import { PageOverrides } from "../site/LongFormPage.js"
import { BAKED_BASE_URL } from "../settings/serverSettings.js"
import { urlToSlug, FullPost, JsonError } from "@ourworldindata/utils"
import { FormattingOptions } from "@ourworldindata/types"
import { getPostBySlug, isPostCitable } from "../db/wpdb.js"
import { getTopSubnavigationParentItem } from "../site/SiteSubnavigation.js"
import { logErrorAndMaybeSendToBugsnag } from "../serverUtils/errorLog.js"
import {
getFullPostBySlugFromSnapshot,
isPostSlugCitable,
} from "../db/model/Post.js"

export const getPostBySlugLogToSlackNoThrow = async (slug: string) => {
let post
try {
post = await getPostBySlug(slug)
post = await getFullPostBySlugFromSnapshot(slug)
} catch (err) {
logErrorAndMaybeSendToBugsnag(err)
} finally {
Expand Down Expand Up @@ -61,7 +64,7 @@ export const getPageOverrides = async (
const landing = await getLandingOnlyIfParent(post, formattingOptions)
if (!landing) return

const isParentLandingCitable = await isPostCitable(landing)
const isParentLandingCitable = isPostSlugCitable(landing.slug)
if (!isParentLandingCitable) return

return {
Expand Down
36 changes: 20 additions & 16 deletions baker/siteRenderers.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -60,13 +60,6 @@ import {
import { FormattingOptions, GrapherInterface } from "@ourworldindata/types"
import { CountryProfileSpec } from "../site/countryProfileProjects.js"
import { formatPost } from "./formatWordpressPost.js"
import {
getBlogIndex,
getLatestPostRevision,
getPostBySlug,
isPostCitable,
getBlockContent,
} from "../db/wpdb.js"
import { queryMysql, knexTable } from "../db/db.js"
import { getPageOverrides, isPageOverridesCitable } from "./pageOverrides.js"
import { ProminentLink } from "../site/blocks/ProminentLink.js"
Expand All @@ -87,7 +80,14 @@ import { ExplorerAdminServer } from "../explorerAdminServer/ExplorerAdminServer.
import { GIT_CMS_DIR } from "../gitCms/GitCmsConstants.js"
import { ExplorerFullQueryParams } from "../explorer/ExplorerConstants.js"
import { resolveInternalRedirect } from "./redirects.js"
import { postsTable } from "../db/model/Post.js"
import {
getBlockContentFromSnapshot,
getBlogIndex,
getFullPostByIdFromSnapshot,
getFullPostBySlugFromSnapshot,
isPostSlugCitable,
postsTable,
} from "../db/model/Post.js"
import { GdocPost } from "../db/model/Gdoc/GdocPost.js"
import { logErrorAndMaybeSendToBugsnag } from "../serverUtils/errorLog.js"
import { GdocFactory } from "../db/model/Gdoc/GdocFactory.js"
Expand Down Expand Up @@ -190,12 +190,12 @@ export const renderGdoc = (gdoc: OwidGdoc, isPreviewing: boolean = false) => {
}

export const renderPageBySlug = async (slug: string) => {
const post = await getPostBySlug(slug)
const post = await getFullPostBySlugFromSnapshot(slug)
return renderPost(post)
}

export const renderPreview = async (postId: number): Promise<string> => {
const postApi = await getLatestPostRevision(postId)
const postApi = await getFullPostByIdFromSnapshot(postId)
return renderPost(postApi)
}

Expand Down Expand Up @@ -229,7 +229,7 @@ export const renderPost = async (

const pageOverrides = await getPageOverrides(post, formattingOptions)
const citationStatus =
(await isPostCitable(post)) || isPageOverridesCitable(pageOverrides)
isPostSlugCitable(post.slug) || isPageOverridesCitable(pageOverrides)

return renderToHtmlPage(
<LongFormPage
Expand Down Expand Up @@ -480,7 +480,7 @@ const getCountryProfilePost = memoize(
grapherExports?: GrapherExports
): Promise<[FormattedPost, FormattingOptions]> => {
// Get formatted content from generic covid country profile page.
const genericCountryProfilePost = await getPostBySlug(
const genericCountryProfilePost = await getFullPostBySlugFromSnapshot(
profileSpec.genericProfileSlug
)

Expand All @@ -500,7 +500,7 @@ const getCountryProfilePost = memoize(
// todo: we used to flush cache of this thing.
const getCountryProfileLandingPost = memoize(
async (profileSpec: CountryProfileSpec) => {
return getPostBySlug(profileSpec.landingPageSlug)
return getFullPostBySlugFromSnapshot(profileSpec.landingPageSlug)
}
)

Expand Down Expand Up @@ -559,7 +559,7 @@ const renderPostThumbnailBySlug = async (

let post
try {
post = await getPostBySlug(slug)
post = await getFullPostBySlugFromSnapshot(slug)
} catch (err) {
// if no post is found, then we return early instead of throwing
}
Expand Down Expand Up @@ -599,7 +599,11 @@ export const renderProminentLinks = async (
? (await Chart.getBySlug(resolvedUrl.slug))?.config
?.title // optim?
: resolvedUrl.slug &&
(await getPostBySlug(resolvedUrl.slug)).title)
(
await getFullPostBySlugFromSnapshot(
resolvedUrl.slug
)
).title)
} finally {
if (!title) {
logErrorAndMaybeSendToBugsnag(
Expand Down Expand Up @@ -709,7 +713,7 @@ export const renderExplorerPage = async (

const wpContent = program.wpBlockId
? await renderReusableBlock(
await getBlockContent(program.wpBlockId),
await getBlockContentFromSnapshot(program.wpBlockId),
program.wpBlockId
)
: undefined
Expand Down
4 changes: 2 additions & 2 deletions baker/sitemap.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@ import {
} from "../settings/serverSettings.js"
import { dayjs, countries, queryParamsToStr } from "@ourworldindata/utils"
import * as db from "../db/db.js"
import * as wpdb from "../db/wpdb.js"
import urljoin from "url-join"
import { countryProfileSpecs } from "../site/countryProfileProjects.js"
import { ExplorerAdminServer } from "../explorerAdminServer/ExplorerAdminServer.js"
import { EXPLORERS_ROUTE_FOLDER } from "../explorer/ExplorerConstants.js"
import { ExplorerProgram } from "../explorer/ExplorerProgram.js"
import { GdocPost } from "../db/model/Gdoc/GdocPost.js"
import { getPostsFromSnapshots } from "../db/model/Post.js"

interface SitemapUrl {
loc: string
Expand Down Expand Up @@ -62,7 +62,7 @@ export const makeSitemap = async (explorerAdminServer: ExplorerAdminServer) => {
const knex = db.knexInstance()
const alreadyPublishedViaGdocsSlugsSet =
await db.getSlugsWithPublishedGdocsSuccessors(knex)
const postsApi = await wpdb.getPosts(
const postsApi = await getPostsFromSnapshots(
undefined,
(postrow) => !alreadyPublishedViaGdocsSlugsSet.has(postrow.slug)
)
Expand Down
Loading

0 comments on commit e7eef1f

Please sign in to comment.