-
-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't fetch image metadata during Algolia sync #3496
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, thank you for working on this ❤️
db/db.ts
Outdated
filenames: string[] | ||
): Promise<Record<string, ImageMetadata>> => { | ||
if (filenames.length === 0) return {} | ||
const rows = (await knexRaw( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const rows = (await knexRaw( | |
const rows = (await knexRaw<Pick<DbEnrichedImage, "id" | "googleId" | "filename" | "defaultAlt" | "updatedAt" | "originalWidth" | "originalHeight">> ( |
I think this pattern is nicer where instead of casting to the full type in the end you pick at the point of the query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True! I've also changed ImageMetadata
to derive from DbEnrichedImage
to hopefully keep things in sync a little better.
db/model/Image.ts
Outdated
return Promise.all(images.map((i) => Image.syncImage(knex, i))) | ||
.then(excludeUndefined) | ||
.catch((e) => { | ||
console.error(`Error syncing images to S3`, e) | ||
return [] | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this a lot less often now should we then maybe take care of the errors a bit better? It might be worth capturing the errors so we can report them up the call chain and back to the client so we can warn the user if some images didn't sync. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I've added errors in the right places so that users will get a JSON 500 response if something bad happens on fetch from gdrive or sync.
Far from ideal, but programming is a process 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is very nice
I have one question about the error reporting if syncing fails, otherwise it's good to go I think!
63240f4
to
66590a5
Compare
Follow up issue here: #3504 |
The reason this PR exists is because the algolia sync script was timing out during my local testing.
It was calling
Gdoc.getAndLoadPublishedGdocPosts
(a proxy forgetAndLoadPublishedGdocPosts
) which calledloadGdocFromGdocBase
on every post, which calledloadImageMetadata
on every post.loadImageMetadata
fetches from Gdrive and syncs to the DB+S3 if there are updates. It was written with the Gdocs preview pipeline in mind, and turned out to be an anti-pattern: when used on all 500 gdocs all at once, we get throttled.So, at first I got a bit distracted by gdocs performance in general and added a cache to the
imageStore
singleton so that it would fetch all metadata during instantiation and then serve the cache on subsequent calls.Fortunately I realized, for the specific issue at hand, it would be far simpler to just not load image metadata when we're indexing documents to Algolia.
Now,
GdocBase.loadState
only loads image data from the DB, andloadGdocFromGdocBase
(its caller) will do the sync with Google beforehand if thecontentSource
isgdocs
e.g.
GET /admin/gdocs/:id?contentSource=gdocs
This meant I could remove a bunch of the readwrite transactions we added due to the previous image flow, but once I started on that, I realized it would quickly blow this PR up into something massive, which seemed like a bad idea right before the offsite, especially because I would prefer to blow this whole system up with Cloudflare Images instead. 😅