Skip to content

Commit

Permalink
🔨 store grapher configs in R2 when edited in the admin (#3827)
Browse files Browse the repository at this point in the history
This PR does 2 things at the high level: it stores grapher configs in R2 when they are edited in the admin, and it adds a tool for syncing the content of the `chart_configs ` table into R2 (adding/updating missing ones and deleting superfluous ones).

Charts are stored in R2 in two folders for now: one that is used for all configs addressed by chart UUID, and one addressed by slug for only published standalone charts.

This PR adds a hash of the full config in the `chart_configs` table. I started out using a SHA-1 and was happy that R2 supports those in addition to MD5, only to discover later on when I wrote the sync tool that the S3 API list operation has no support for hashes other than MD5. Since the main point of the hash is facilitate efficient comparison between set of configs in the DB and those in R2 I then rewrote the hash to be md5. The hash is stored in base64 encoding since this is also what is used and returned in most api calls (the only exception is the ETAG where a hex serialization in double quotes is used 🤷)

We already have a key configured for interacting with R2 so I renamed the settings related to this key to have a generic (non-image specific) name.

To test this:
- set the R2_ACCESS_KEY_ID and R2_SECRET_ACCESS_KEY to a personal CF R2 token that you generate. If you already did this previously for images (i.e. if your .env file already has IMAGE_HOSTING_R2_ACCESS_KEY_ID) then rename these two keys AND make sure that your personal key has access to the `owid-grapher-configs-staging` bucket (CF dashboard -> R2 -> manage api tokens in the top right)
- set GRAPHER_CONFIG_R2_BUCKET to `owid-grapher-configs-staging` and GRAPHER_CONFIG_R2_BUCKET_PATH to `devs/YOURNAME`

Then when you interact with the admin and save graphers, you should see these charts show up as json files in the R2 bucket in the `owid-grapher-configs-staging` bucket.
  • Loading branch information
danyx23 authored Sep 9, 2024
2 parents d30b388 + 64bf832 commit 85e049a
Show file tree
Hide file tree
Showing 19 changed files with 747 additions and 60 deletions.
6 changes: 3 additions & 3 deletions .env.devcontainer
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ GDOCS_CLIENT_ID=''
GDOCS_BASIC_ARTICLE_TEMPLATE_URL=''
GDOCS_SHARED_DRIVE_ID=''

IMAGE_HOSTING_R2_ENDPOINT=''
R2_ENDPOINT=''
IMAGE_HOSTING_R2_CDN_URL=''
IMAGE_HOSTING_R2_BUCKET_PATH=''
IMAGE_HOSTING_R2_ACCESS_KEY_ID=''
IMAGE_HOSTING_R2_SECRET_ACCESS_KEY=''
R2_ACCESS_KEY_ID=''
R2_SECRET_ACCESS_KEY=''
12 changes: 9 additions & 3 deletions .env.example-full
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,17 @@ GDOCS_BASIC_ARTICLE_TEMPLATE_URL=
GDOCS_SHARED_DRIVE_ID=
GDOCS_DONATE_FAQS_DOCUMENT_ID= # optional

IMAGE_HOSTING_R2_ENDPOINT= # optional
R2_ENDPOINT= # optional
IMAGE_HOSTING_R2_CDN_URL=
IMAGE_HOSTING_R2_BUCKET_PATH=
IMAGE_HOSTING_R2_ACCESS_KEY_ID= # optional
IMAGE_HOSTING_R2_SECRET_ACCESS_KEY= # optional
R2_ACCESS_KEY_ID= # optional
R2_SECRET_ACCESS_KEY= # optional
# These two GRAPHER_CONFIG_ settings are used to store grapher configs in an R2 bucket.
# The cloudflare workers for thumbnail rendering etc use these settings to fetch the grapher configs.
# This means that for most local dev it is not necessary to set these.
GRAPHER_CONFIG_R2_BUCKET= # optional - for local dev set it to "owid-grapher-configs-staging"
GRAPHER_CONFIG_R2_BUCKET_PATH= # optional - for local dev set it to "devs/YOURNAME"


OPENAI_API_KEY=

Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -43,3 +43,4 @@ dist/
.nx/workspace-data
.dev.vars
**/tsup.config.bundled*.mjs
cfstorage/
164 changes: 134 additions & 30 deletions adminSiteServer/apiRouter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,11 @@ import {
ADMIN_BASE_URL,
DATA_API_URL,
} from "../settings/serverSettings.js"
import { expectInt, isValidSlug } from "../serverUtils/serverUtil.js"
import {
Base64String,
expectInt,
isValidSlug,
} from "../serverUtils/serverUtil.js"
import {
OldChartFieldList,
assignTagsForCharts,
Expand Down Expand Up @@ -167,6 +171,13 @@ import { GdocDataInsight } from "../db/model/Gdoc/GdocDataInsight.js"
import { GdocHomepage } from "../db/model/Gdoc/GdocHomepage.js"
import { GdocAuthor } from "../db/model/Gdoc/GdocAuthor.js"
import path from "path"
import {
deleteGrapherConfigFromR2,
deleteGrapherConfigFromR2ByUUID,
R2GrapherConfigDirectory,
saveGrapherConfigToR2,
saveGrapherConfigToR2ByUUID,
} from "./chartConfigR2Helpers.js"

const apiRouter = new FunctionalRouter()

Expand Down Expand Up @@ -303,7 +314,7 @@ const saveNewChart = async (
// new charts inherit by default
shouldInherit = true,
}: { config: GrapherInterface; user: DbPlainUser; shouldInherit?: boolean }
): Promise<GrapherInterface> => {
): Promise<{ patchConfig: GrapherInterface; fullConfig: GrapherInterface }> => {
// grab the parent of the chart if inheritance should be enabled
const parent = shouldInherit
? await getParentByChartConfig(knex, config)
Expand All @@ -316,19 +327,20 @@ const saveNewChart = async (
// compute patch and full configs
const patchConfig = diffGrapherConfigs(config, fullParentConfig)
const fullConfig = mergeGrapherConfigs(fullParentConfig, patchConfig)
const fullConfigStringified = serializeChartConfig(fullConfig)

// insert patch & full configs into the chart_configs table
const configId = uuidv7()
const chartConfigId = uuidv7()
await db.knexRaw(
knex,
`-- sql
INSERT INTO chart_configs (id, patch, full)
VALUES (?, ?, ?)
`,
[
configId,
chartConfigId,
serializeChartConfig(patchConfig),
serializeChartConfig(fullConfig),
fullConfigStringified,
]
)

Expand All @@ -339,7 +351,7 @@ const saveNewChart = async (
INSERT INTO charts (configId, isInheritanceEnabled, lastEditedAt, lastEditedByUserId)
VALUES (?, ?, ?, ?)
`,
[configId, shouldInherit, new Date(), user.id]
[chartConfigId, shouldInherit, new Date(), user.id]
)

// The chart config itself has an id field that should store the id of the chart - update the chart now so this is true
Expand All @@ -359,7 +371,25 @@ const saveNewChart = async (
[chartId, chartId, chartId]
)

return patchConfig
// We need to get the full config and the md5 hash from the database instead of
// computing our own md5 hash because MySQL normalizes JSON and our
// client computed md5 would be different from the ones computed by and stored in R2
const fullConfigMd5 = await db.knexRawFirst<
Pick<DbRawChartConfig, "full" | "fullMd5">
>(
knex,
`-- sql
select full, fullMd5 from chart_configs where id = ?`,
[chartConfigId]
)

await saveGrapherConfigToR2ByUUID(
chartConfigId,
fullConfigMd5!.full,
fullConfigMd5!.fullMd5 as Base64String
)

return { patchConfig, fullConfig }
}

const updateExistingChart = async (
Expand All @@ -372,7 +402,7 @@ const updateExistingChart = async (
// if true or false, enable or disable inheritance
shouldInherit?: boolean
}
): Promise<GrapherInterface> => {
): Promise<{ patchConfig: GrapherInterface; fullConfig: GrapherInterface }> => {
const { config, user, chartId } = params

// make sure that the id of the incoming config matches the chart id
Expand All @@ -393,22 +423,31 @@ const updateExistingChart = async (
// compute patch and full configs
const patchConfig = diffGrapherConfigs(config, fullParentConfig)
const fullConfig = mergeGrapherConfigs(fullParentConfig, patchConfig)
const fullConfigStringified = serializeChartConfig(fullConfig)

const chartConfigId = await db.knexRawFirst<Pick<DbPlainChart, "configId">>(
knex,
`SELECT configId FROM charts WHERE id = ?`,
[chartId]
)

if (!chartConfigId)
throw new JsonError(`No chart config found for id ${chartId}`, 404)

// update configs
await db.knexRaw(
knex,
`-- sql
UPDATE chart_configs cc
JOIN charts c ON c.configId = cc.id
UPDATE chart_configs
SET
cc.patch=?,
cc.full=?
WHERE c.id = ?
patch=?,
full=?
WHERE id = ?
`,
[
serializeChartConfig(patchConfig),
serializeChartConfig(fullConfig),
chartId,
fullConfigStringified,
chartConfigId.configId,
]
)

Expand All @@ -423,7 +462,25 @@ const updateExistingChart = async (
[shouldInherit, new Date(), user.id, chartId]
)

return patchConfig
// We need to get the full config and the md5 hash from the database instead of
// computing our own md5 hash because MySQL normalizes JSON and our
// client computed md5 would be different from the ones computed by and stored in R2
const fullConfigMd5 = await db.knexRawFirst<
Pick<DbRawChartConfig, "full" | "fullMd5">
>(
knex,
`-- sql
select full, fullMd5 from chart_configs where id = ?`,
[chartConfigId.configId]
)

await saveGrapherConfigToR2ByUUID(
chartConfigId.configId,
fullConfigMd5!.full,
fullConfigMd5!.fullMd5 as Base64String
)

return { patchConfig, fullConfig }
}

const saveGrapher = async (
Expand Down Expand Up @@ -505,6 +562,11 @@ const saveGrapher = async (
`INSERT INTO chart_slug_redirects (chart_id, slug) VALUES (?, ?)`,
[existingConfig.id, existingConfig.slug]
)
// When we rename grapher configs, make sure to delete the old one (the new one will be saved below)
await deleteGrapherConfigFromR2(
R2GrapherConfigDirectory.publishedGrapherBySlug,
`${existingConfig.slug}.json`
)
}
}

Expand Down Expand Up @@ -540,28 +602,34 @@ const saveGrapher = async (

// Execute the actual database update or creation
let chartId: number
let patchConfig: GrapherInterface
let fullConfig: GrapherInterface
if (existingConfig) {
chartId = existingConfig.id!
newConfig = await updateExistingChart(knex, {
const configs = await updateExistingChart(knex, {
config: newConfig,
user,
chartId,
shouldInherit,
})
patchConfig = configs.patchConfig
fullConfig = configs.fullConfig
} else {
newConfig = await saveNewChart(knex, {
const configs = await saveNewChart(knex, {
config: newConfig,
user,
shouldInherit,
})
chartId = newConfig.id!
patchConfig = configs.patchConfig
fullConfig = configs.fullConfig
chartId = fullConfig.id!
}

// Record this change in version history
const chartRevisionLog = {
chartId: chartId as number,
userId: user.id,
config: serializeChartConfig(newConfig),
config: serializeChartConfig(patchConfig),
createdAt: new Date(),
updatedAt: new Date(),
} satisfies DbInsertChartRevision
Expand All @@ -583,7 +651,7 @@ const saveGrapher = async (
chartId,
])

const newDimensions = newConfig.dimensions ?? []
const newDimensions = fullConfig.dimensions ?? []
for (const [i, dim] of newDimensions.entries()) {
await db.knexRaw(
knex,
Expand All @@ -593,15 +661,38 @@ const saveGrapher = async (
}

// So we can generate country profiles including this chart data
if (newConfig.isPublished && referencedVariablesMightChange)
if (fullConfig.isPublished && referencedVariablesMightChange)
// TODO: remove this ad hoc knex transaction context when we switch the function to knex
await denormalizeLatestCountryData(
knex,
newDimensions.map((d) => d.variableId)
)

if (fullConfig.isPublished) {
// We need to get the full config and the md5 hash from the database instead of
// computing our own md5 hash because MySQL normalizes JSON and our
// client computed md5 would be different from the ones computed by and stored in R2
const fullConfigMd5 = await db.knexRawFirst<
Pick<DbRawChartConfig, "full" | "fullMd5">
>(
knex,
`-- sql
select cc.full, cc.fullMd5 from chart_configs cc
join charts c on c.configId = cc.id
where c.id = ?`,
[chartId]
)

await saveGrapherConfigToR2(
fullConfigMd5!.full,
R2GrapherConfigDirectory.publishedGrapherBySlug,
`${fullConfig.slug}.json`,
fullConfigMd5!.fullMd5 as Base64String
)
}

if (
newConfig.isPublished &&
fullConfig.isPublished &&
(!existingConfig || !existingConfig.isPublished)
) {
// Newly published, set publication info
Expand All @@ -610,9 +701,9 @@ const saveGrapher = async (
`UPDATE charts SET publishedAt=?, publishedByUserId=? WHERE id = ? `,
[new Date(), user.id, chartId]
)
await triggerStaticBuild(user, `Publishing chart ${newConfig.slug}`)
await triggerStaticBuild(user, `Publishing chart ${fullConfig.slug}`)
} else if (
!newConfig.isPublished &&
!fullConfig.isPublished &&
existingConfig &&
existingConfig.isPublished
) {
Expand All @@ -622,13 +713,17 @@ const saveGrapher = async (
`DELETE FROM chart_slug_redirects WHERE chart_id = ?`,
[existingConfig.id]
)
await triggerStaticBuild(user, `Unpublishing chart ${newConfig.slug}`)
} else if (newConfig.isPublished)
await triggerStaticBuild(user, `Updating chart ${newConfig.slug}`)
await deleteGrapherConfigFromR2(
R2GrapherConfigDirectory.publishedGrapherBySlug,
`${existingConfig.slug}.json`
)
await triggerStaticBuild(user, `Unpublishing chart ${fullConfig.slug}`)
} else if (fullConfig.isPublished)
await triggerStaticBuild(user, `Updating chart ${fullConfig.slug}`)

return {
chartId,
savedPatch: newConfig,
savedPatch: patchConfig,
}
}

Expand Down Expand Up @@ -1010,11 +1105,13 @@ deleteRouteWithRWTransaction(
[chart.id]
)

const row = await db.knexRawFirst<{ configId: number }>(
const row = await db.knexRawFirst<Pick<DbPlainChart, "configId">>(
trx,
`SELECT configId FROM charts WHERE id = ?`,
[chart.id]
)
if (!row || !row.configId)
throw new JsonError(`No chart config found for id ${chart.id}`, 404)
if (row) {
await db.knexRaw(trx, `DELETE FROM charts WHERE id=?`, [chart.id])
await db.knexRaw(trx, `DELETE FROM chart_configs WHERE id=?`, [
Expand All @@ -1028,6 +1125,13 @@ deleteRouteWithRWTransaction(
`Deleting chart ${chart.slug}`
)

await deleteGrapherConfigFromR2ByUUID(row.configId)
if (chart.isPublished)
await deleteGrapherConfigFromR2(
R2GrapherConfigDirectory.publishedGrapherBySlug,
`${chart.slug}.json`
)

return { success: true }
}
)
Expand Down
Loading

0 comments on commit 85e049a

Please sign in to comment.