-
-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 Update algolia index when publishing/updating/unpublishing gdoc posts #3427
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
It's a bummer that the gdocs PUT route is so full now; would be nice if we can factor some things out there at some point.
adminSiteServer/apiRouter.ts
Outdated
@@ -2380,12 +2387,17 @@ putRouteWithRWTransaction(apiRouter, "/gdocs/:id", async (req, res, trx) => { | |||
nextJson.slug | |||
) | |||
} else if (checkFullDeployFallback(prevJson, nextJson, hasChanges)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's a bit weird is that lightning deploys don't trigger a GdocPublishingAction.Updating
action.
I don't know the numbers, but they might be the more common case than fallback deploys, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True! I've fixed lightning deploys: a function property was missing from the properties to exclude list and so isEqual(prev, next)
was always returning false
- not great that it's so easy to miss this but I think it was mostly due to the large gdocs model refactor and should be easier to notice again now that it's working.
I also extracted functions from the PUT route (probably easier to review in an editor than the git diff) most notably indexAndBakeGdocIfNeccesary
which handles all four possible publishing action cases more explicitly, now, and should make it easier to update.
48c18da
to
fc1e521
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactor, that's making this much better. Thank you!
* - Updating (index and bake (potentially via lightning deploy)) | ||
* - Unpublishing (remove from index and bake) | ||
*/ | ||
async function indexAndBakeGdocIfNeccesary( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it makes sense to move this one to gdocsDeploy
, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeahhh I had the same desire but those functions are in the adminSiteClient
folder, and I didn't want to make a new file just for one function so I kept it close to usage instead.
WalkthroughThe recent updates focus on enhancing content management and search capabilities through Algolia indexing at deploy time. Key changes include the introduction of new functions and enums for handling Google Docs posts, improvements in image synchronization, and the addition of Algolia indexing utilities. These modifications aim to streamline content updates and improve search functionality, aligning with the objective to index content live as it is published. Changes
Assessment against linked issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
export async function syncImagesAndAddToContentGraph( | ||
trx: KnexReadWriteTransaction, | ||
gdoc: GdocPost | GdocDataInsight | GdocHomepage | GdocAuthor | ||
): Promise<void> { | ||
const id = gdoc.id | ||
// Deleting and recreating these is simpler than tracking orphans over the next code block | ||
await trx.table(PostsGdocsXImagesTableName).where({ gdocId: id }).delete() | ||
const filenames = gdoc.filenames | ||
|
||
// Includes fragments so that images in data pages are | ||
// synced to S3 and ultimately baked in bakeDriveImages(). | ||
if (filenames.length && gdoc.published) { | ||
await imageStore.fetchImageMetadata(filenames) | ||
const images = await imageStore.syncImagesToS3(trx) | ||
const gdocXImagesToInsert: DbInsertPostGdocXImage[] = [] | ||
for (const image of images) { | ||
if (image) { | ||
gdocXImagesToInsert.push({ | ||
gdocId: gdoc.id, | ||
imageId: image.id, | ||
}) | ||
} | ||
} | ||
try { | ||
await trx | ||
.table(PostsGdocsXImagesTableName) | ||
.insert(gdocXImagesToInsert) | ||
} catch (e) { | ||
console.error( | ||
`Error tracking image references with Google ID ${gdoc.id}`, | ||
e | ||
) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip
Codebase Verification
The syncImagesAndAddToContentGraph
function implements basic error handling by logging errors to the console and adopts a straightforward approach to syncing images by deleting and re-adding references. This method simplifies the logic but might not be the most efficient in scenarios where the image set does not change frequently. While the error handling captures errors during the database insertion process, it does not provide mechanisms for retrying failed operations or notifying external systems/services of the failure, which could be areas for improvement. The review comment is partially addressed in the implementation, suggesting that while the function includes error handling, there could be room for enhancement in terms of efficiency and more comprehensive error handling strategies.
Analysis chain
The implementation of syncImagesAndAddToContentGraph
for syncing images to S3 and updating image references in the database is a crucial functionality. Ensure that the image syncing process is efficient and that error handling is robust to prevent data inconsistencies.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify the efficiency and error handling of the image syncing process.
rg "syncImagesAndAddToContentGraph" db/model/Gdoc/GdocFactory.ts
Length of output: 118
Script:
#!/bin/bash
# Display the content of the syncImagesAndAddToContentGraph function for detailed examination
rg -A 30 "async function syncImagesAndAddToContentGraph" db/model/Gdoc/GdocFactory.ts
Length of output: 1344
b2fca20
to
3854a91
Compare
Resolves #2852
Adds a new step to the dreaded gdocs PUT route ☠️
If publishing/updating:
Update the pages index with new records
If deleting/unpublishing:
Delete the records from the pages index
I had to move stuff out of
indexToAlgolia.tsx
so that I could import a function without the implicit call toindexToAlgola()
being evaluated.Also added some comments about Fragments and #3426 as I encountered that annoying little gotcha.
To test:
Set
ALGOLIA_INDEXING=true
,ALGOLIA_ID
,ALGOLIA_SEARCH_KEY
,ALGOLIA_SECRET_KEY
with values from the owid-staging application on Algolia and set in your local.env
filePublish a test post with at least 1500 characters (for chunking) and verify that it shows up in the owid-staging pages index
Update the post to have a short amount of content and republish it. Confirm that the content is updated in the index and that records from the larger version of the post are deleted.
Change the slug of the post and republish it again. Confirm that the records are updated correctly.
Unpublish the post. Confirm that the records are deleted from the index.
Publish the post again, confirm that new records are created, and then delete the post (without unpublishing first). Confirm that the records are deleted.
Confirm that fragments don't get indexed but topic pages, linear topic pages, about pages, and articles do.