Skip to content

Commit

Permalink
🔨 change CF thumbnail function to use chart configs from R2 (#3867)
Browse files Browse the repository at this point in the history
This PR changes how the CF thumnail worker gets the config for a chart. It used to be the case that it would fetch the HTML file of the grapher page at the given slug and extract the config from that HTML. Now it looks up the grapher config json file in an R2 bucket.

This PR only changes this for published charts accessed by slug. A later PR will enable this also by UUID.

Cloudflare is a bit weird with the intersection of support of various features between CF workers/pages functions, R2 and local/remote dev support:

- Initially I wanted to use bindings in CF functions. For local development, wrangler dev does not support using a remote R2 bucket which is annoying
- I then thought I'd switch to fetching data from R2 buckets via HTTPS instead of bindings, similar to what we do with our variable data/metadata.json files. I realized then though that this only works if the bucket itself is accessible publicly via HTTPS. I'd like to keep the R2 bucket internal and not expose it directly.
- My next approach was to try to use the S3 API in the CF pages function. This didn't work because the default AWS library doesn't work in CF functions and the simpler replacement libraries that I could find were very barebones and made you deal with XML manually
- I then decided to go back to bindings and accepting the somewhat annoying dev story
- But then I realized that once we get MDims, our data managers will want to make changes to charts and also to make changes to MDims and see their changes. If we fetch chart configs for MDims from CF functions in the future then on staging servers this would have been a painful story
- So in the end I decided to make the R2 buckets accessible via the public internet and use fetch instead of R2 bindings.

To test this PR, the main change to your local setup is to add an entry to your .dev.vars example to specify the path within the staging R2 bucket to use for thumbnail rendering:
```
GRAPHER_CONFIG_R2_BUCKET_PATH=devs/YOURNAME
```
Before you can test the thumnail rendering locally for a particular chart, you will either have to edit one chart in the admin to make sure it gets uploaded to R2, or run the sync script in devTolls/syncGraphersToR2

This PR adds a few new env vars for CF functions:
- GRAPHER_CONFIG_R2_BUCKET_URL - the primary bucket to read from
- GRAPHER_CONFIG_R2_BUCKET_FALLBACK_URL - the fallback bucket to read from (this should be the prod bucket; on prod it should be empty)
- GRAPHER_CONFIG_R2_BUCKET_FALLBACK_PATH - the path in the fallback bucket to read from (the path inside the prod bucket; on prod it should be empty)
- GRAPHER_CONFIG_R2_BUCKET_PATH - in wrangler.toml we set this only for prod as it's constant there. For local dev setups it should be used from the .dev.vars as per above. Staging servers fall back to the branch name.
  • Loading branch information
danyx23 authored Sep 10, 2024
2 parents e8968df + 884ec4c commit e15b42b
Show file tree
Hide file tree
Showing 10 changed files with 121 additions and 35 deletions.
2 changes: 2 additions & 0 deletions .dev.vars.example
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,5 @@ MAILGUN_SENDING_KEY=
# optional
SLACK_BOT_OAUTH_TOKEN=
SLACK_ERROR_CHANNEL_ID=C016H0BNNB1 #bot-testing channel

GRAPHER_CONFIG_R2_BUCKET_PATH=devs/YOURNAME
2 changes: 1 addition & 1 deletion adminSiteServer/apiRouter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ import {
FlatTagGraph,
DbRawChartConfig,
parseChartConfig,
R2GrapherConfigDirectory,
} from "@ourworldindata/types"
import { uuidv7 } from "uuidv7"
import {
Expand Down Expand Up @@ -174,7 +175,6 @@ import path from "path"
import {
deleteGrapherConfigFromR2,
deleteGrapherConfigFromR2ByUUID,
R2GrapherConfigDirectory,
saveGrapherConfigToR2,
saveGrapherConfigToR2ByUUID,
} from "./chartConfigR2Helpers.js"
Expand Down
6 changes: 1 addition & 5 deletions adminSiteServer/chartConfigR2Helpers.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,10 @@ import {
S3Client,
} from "@aws-sdk/client-s3"
import { JsonError, lazy } from "@ourworldindata/utils"
import { R2GrapherConfigDirectory } from "@ourworldindata/types"
import { logErrorAndMaybeSendToBugsnag } from "../serverUtils/errorLog.js"
import { Base64String } from "../serverUtils/serverUtil.js"

export enum R2GrapherConfigDirectory {
byUUID = "config/by-uuid",
publishedGrapherBySlug = "config/by-slug-published",
}

const getS3Client: () => S3Client = lazy(
() =>
new S3Client({
Expand Down
2 changes: 1 addition & 1 deletion devTools/syncGraphersToR2/syncGraphersToR2.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ import {
KnexReadonlyTransaction,
knexReadonlyTransaction,
} from "../../db/db.js"
import { R2GrapherConfigDirectory } from "../../adminSiteServer/chartConfigR2Helpers.js"
import { DbRawChartConfig, excludeUndefined } from "@ourworldindata/utils"
import { chunk } from "lodash"
import ProgressBar from "progress"
Expand All @@ -31,6 +30,7 @@ import {
HexString,
hexToBytes,
} from "../../serverUtils/serverUtil.js"
import { R2GrapherConfigDirectory } from "@ourworldindata/types"

type HashAndId = Pick<DbRawChartConfig, "fullMd5" | "id">

Expand Down
10 changes: 7 additions & 3 deletions functions/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ Pages Functions are very similar to Cloudflare Workers; however they will always
Pages Functions use file-based routing, which means that the file `grapher/[slug].ts` will serve routes like `/grapher/child-mortality`.
In addition, there's a [`_routes.json`](../_routes.json) file that specifies which routes are to be served dynamically.

Inside a file-based route we sometimes use an instance of itty-router to decide on the exact functionality to provide (e.g. png vs svg generation)

## Development

1. Copy `.dev.vars.example` to `.dev.vars` and fill in the required variables.
Expand All @@ -26,17 +28,19 @@ Note: compatibility dates between local development, production and preview envi

3. _Refer to each function's "Development" section below for further instructions._

## Testing on Fondation staging sites vs Cloudfare previews
## Testing on Foundation staging sites vs Cloudflare previews

We have two cloudflare projects set up that you can deploy previews to. `owid` which is also where our production deployment runs, and `owid-staging`. Currently, `owid` is configured to require authentication while `owid-staging` is accessible from the internet without any kind of auth.

`yarn deployContentPreview` deploys the staging `bakedSite` to a Cloudflare preview at https://[PREVIEW_BRANCH].owid-staging.pages.dev. This is the recommended way to test functions in a production-like environment. See [../ops/buildkite/deploy-content-preview](../ops/buildkite/deploy-content-preview) for more details.
`yarn deployContentPreview` deploys the staging `bakedSite` to a Cloudflare preview at https://[PREVIEW_BRANCH].[PROJECT].pages.dev. This is the recommended way to test functions in a production-like environment. See [../ops/buildkite/deploy-content-preview](../ops/buildkite/deploy-content-preview) for more details.

### Rationale

A custom staging site is available at http://staging-site-[BRANCH] upon pushing your branch (see ops > templates > lxc-manager > staging-create). This site is served by `wrangler` (see ops > templates > owid-site-staging > grapher-refresh.sh). `wrangler` is helpful for testing the functions locally (and possibly for some debugging scenarios on staging servers), but is still not the closest match to the production Cloudflare environment.

When it comes to testing functions in a production-like environment, Cloudflare previews are recommended.

Cloudflare previews are served by Cloudflare (as opposed to `wrangler` on staging sites) and are available at https://[RANDOM_ID].owid-staging.pages.dev. Cloudflare previews do not rely on the `wrangler` CLI and its `.dev.vars` file. Instead, they use the [Cloudflare dashboard to configure environment variables](https://dash.cloudflare.com/078fcdfed9955087315dd86792e71a7e/pages/view/owid/settings/environment-variables), in the same way and place as the production site.
Cloudflare previews are served by Cloudflare (as opposed to `wrangler` on staging sites) and are available at https://[RANDOM_ID].[PROJECT].pages.dev. Cloudflare previews do not rely on the `wrangler` CLI and its `.dev.vars` file, but they do take the `wrangler.toml` file into account for environment variables. For secrets, they use the [values set via the Cloudflare dashboard](https://dash.cloudflare.com/078fcdfed9955087315dd86792e71a7e/pages/view/owid/settings/environment-variables), in the same way and place as the production site.

This proximity of configurations in the Cloudflare dashboard makes spotting differences between production and preview environments easier - and is one of the reason of using Cloudflare previews in the same project (owid) over using a new project specific to staging.

Expand Down
110 changes: 85 additions & 25 deletions functions/_common/grapherRenderer.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
import { Grapher, GrapherInterface } from "@ourworldindata/grapher"
import { Bounds, deserializeJSONFromHTML } from "@ourworldindata/utils"
import { Grapher } from "@ourworldindata/grapher"
import {
Bounds,
excludeUndefined,
GrapherInterface,
R2GrapherConfigDirectory,
} from "@ourworldindata/utils"
import { svg2png, initialize as initializeSvg2Png } from "svg2png-wasm"
import { TimeLogger } from "./timeLogger"
import { png } from "itty-router"
Expand Down Expand Up @@ -130,32 +135,82 @@ const extractOptions = (params: URLSearchParams): ImageOptions => {
return options as ImageOptions
}

async function fetchAndRenderGrapherToSvg({
slug,
options,
searchParams,
env,
}: {
slug: string
options: ImageOptions
searchParams: URLSearchParams
const WORKER_CACHE_TIME_IN_SECONDS = 60

async function fetchFromR2(
url: URL,
etag: string | undefined,
fallbackUrl?: URL
) {
const headers = new Headers()
if (etag) headers.set("If-None-Match", etag)
const init = {
cf: {
cacheEverything: true,
cacheTtl: WORKER_CACHE_TIME_IN_SECONDS,
},
headers,
}
const primaryResponse = await fetch(url.toString(), init)
if (primaryResponse.status === 404 && fallbackUrl) {
return fetch(fallbackUrl.toString(), init)
}
return primaryResponse
}

async function fetchAndRenderGrapherToSvg(
slug: string,
options: ImageOptions,
searchParams: URLSearchParams,
env: Env
}) {
) {
const grapherLogger = new TimeLogger("grapher")

// Fetch grapher config and extract it from the HTML
const grapherConfig: GrapherInterface = await env.ASSETS.fetch(
new URL(`/grapher/${slug}`, env.url)
)
.then((r) => (r.ok ? r : Promise.reject("Failed to load grapher page")))
.then((r) => r.text())
.then((html) => deserializeJSONFromHTML(html))
// The top level directory is either the bucket path (should be set in dev environments and production)
// or the branch name on preview staging environments
console.log("branch", env.CF_PAGES_BRANCH)
const topLevelDirectory = env.GRAPHER_CONFIG_R2_BUCKET_PATH
? [env.GRAPHER_CONFIG_R2_BUCKET_PATH]
: ["by-branch", env.CF_PAGES_BRANCH]

const key = excludeUndefined([
...topLevelDirectory,
R2GrapherConfigDirectory.publishedGrapherBySlug,
`${slug}.json`,
]).join("/")

console.log("fetching grapher config from this key", key)

const requestUrl = new URL(key, env.GRAPHER_CONFIG_R2_BUCKET_URL)

let fallbackUrl

if (!grapherConfig) {
throw new Error("Could not find grapher config")
if (
env.GRAPHER_CONFIG_R2_BUCKET_FALLBACK_URL &&
env.GRAPHER_CONFIG_R2_BUCKET_FALLBACK_PATH
) {
const topLevelDirectory = env.GRAPHER_CONFIG_R2_BUCKET_FALLBACK_PATH
const fallbackKey = excludeUndefined([
topLevelDirectory,
R2GrapherConfigDirectory.publishedGrapherBySlug,
`${slug}.json`,
]).join("/")
fallbackUrl = new URL(
fallbackKey,
env.GRAPHER_CONFIG_R2_BUCKET_FALLBACK_URL
)
}

// Fetch grapher config
const fetchResponse = await fetchFromR2(requestUrl, undefined, fallbackUrl)

if (fetchResponse.status !== 200) {
console.log("Failed to fetch grapher config", fetchResponse.status)
return null
}

grapherLogger.log("fetchGrapherConfig")
const grapherConfig: GrapherInterface = await fetchResponse.json()
console.log("grapher title", grapherConfig.title)

const bounds = new Bounds(0, 0, options.svgWidth, options.svgHeight)
const grapher = new Grapher({
Expand Down Expand Up @@ -199,12 +254,17 @@ export const fetchAndRenderGrapher = async (
const options = extractOptions(searchParams)

console.log("Rendering", slug, outType, options)
const svg = await fetchAndRenderGrapherToSvg({
const svg = await fetchAndRenderGrapherToSvg(
slug,
options,
searchParams,
env,
})
env
)
console.log("fetched svg")

if (!svg) {
return new Response("Not found", { status: 404 })
}

switch (outType) {
case "png":
Expand Down
5 changes: 5 additions & 0 deletions functions/grapher/thumbnail/[slug].ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@ export interface Env {
fetch: typeof fetch
}
url: URL
GRAPHER_CONFIG_R2_BUCKET_URL: string
GRAPHER_CONFIG_R2_BUCKET_FALLBACK_URL: string
GRAPHER_CONFIG_R2_BUCKET_PATH: string
GRAPHER_CONFIG_R2_BUCKET_FALLBACK_PATH: string
CF_PAGES_BRANCH: string
ENV: string
}

Expand Down
5 changes: 5 additions & 0 deletions packages/@ourworldindata/types/src/domainTypes/Various.ts
Original file line number Diff line number Diff line change
Expand Up @@ -64,3 +64,8 @@ export class JsonError extends Error {
export interface QueryParams {
[key: string]: string | undefined
}

export enum R2GrapherConfigDirectory {
byUUID = "config/by-uuid",
publishedGrapherBySlug = "config/by-slug-published",
}
1 change: 1 addition & 0 deletions packages/@ourworldindata/types/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ export {
type RawPageview,
type UserCountryInformation,
type QueryParams,
R2GrapherConfigDirectory,
} from "./domainTypes/Various.js"
export { type BreadcrumbItem, type KeyValueProps } from "./domainTypes/Site.js"
export {
Expand Down
13 changes: 13 additions & 0 deletions wrangler.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,19 @@ pages_build_output_dir = "./localBake"
# Vars that should be available in all envs, including local dev
[vars]
ENV = "development"
GRAPHER_CONFIG_R2_BUCKET_URL = "https://grapher-configs-staging.owid.io"
GRAPHER_CONFIG_R2_BUCKET_FALLBACK_URL = "https://grapher-configs.owid.io"
GRAPHER_CONFIG_R2_BUCKET_FALLBACK_PATH = "v1"


# Overrides for CF preview deployments
[env.preview.vars]
MAILGUN_DOMAIN = "mg.ourworldindata.org"
SLACK_ERROR_CHANNEL_ID = "C016H0BNNB1"
ENV = "preview"
GRAPHER_CONFIG_R2_BUCKET_URL = "https://grapher-configs-staging.owid.io"
GRAPHER_CONFIG_R2_BUCKET_FALLBACK_URL = "https://grapher-configs.owid.io"
GRAPHER_CONFIG_R2_BUCKET_FALLBACK_PATH = "v1"

# Overrides for CF production deployment
[env.production]
Expand All @@ -21,3 +28,9 @@ compatibility_date = "2024-04-29"
ENV = "production"
MAILGUN_DOMAIN = "mg.ourworldindata.org"
SLACK_ERROR_CHANNEL_ID = "C5JJW19PS"
GRAPHER_CONFIG_R2_BUCKET_URL = "https://grapher-configs.owid.io"
GRAPHER_CONFIG_R2_BUCKET_FALLBACK_URL = ""
GRAPHER_CONFIG_R2_BUCKET_FALLBACK_PATH = ""
GRAPHER_CONFIG_R2_BUCKET_PATH = "v1"


0 comments on commit e15b42b

Please sign in to comment.