-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature(search): corpus search infrastructure, backfill, and ingest pipeline #720
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* feat(corpus-search): sagemaker + os deployment infra * chore: temp * chore(infra): add lambda connector scripts and ci deployment * chore: update lambda exec role and logging * chore: add sentry release notification * chore: logging for debug * chore: trying node fetch * chore: deploy to 3 subnets * chore: attempt with copying the same sg config for ecs * chore: sign requests with aws4 * fix: wrong .ok accessor * fix: put don't post * fix: opensearch service url * feat(search): move sagemaker embeddings to user list search * chore: remove connectors and separate deploy * chore(corpus-search): move infrastructure to user-list-search * chore: remove corpus-embeddings infrastructure * chore: corpus-embeddings module in user-list-search * chore: add back random string * chore: terraform fmt * feat(embeddings): add embeddings and cutover to corpus search cluster Update ingest lambdas to write to new corpus search cluster instead of user-list-search cluster. Update parser hydration lambda to request embeddings with parser data as a fallback (if title and excerpt not provided) and upload embeddings to corpus search cluster. [POCKET-10388] * chore(cleanup): remove embeddings connector lambda creator * chore: update lockfile * chore(cleanup): remove corpus-embeddings infrastructure It's included in user-list-search due to the natural dependency relationship * chore: separate sentry dsn for corpus search For better grouping and more lax data scrub rules Since this does not include user data * chore: fix typo * chore: tweak timing of delay queue and vis timeout * feat(sagemaker): moving sagemake to a seperate module (#715) --------- Co-authored-by: Daniel Brooks <[email protected]>
* fix: more delays if throttled * feat(search): corpus search backfill script
Plan Result (user-list-search-production)
|
5 tasks
Added the Sentry DSN value to the prod account. I'm looking into the corpus search cluster to see if there is any way I can get around having to destroy it. |
kschelonka
force-pushed
the
feature/semantic-search
branch
from
September 9, 2024 16:34
45a49a2
to
5db91b3
Compare
kschelonka
force-pushed
the
feature/semantic-search
branch
from
September 9, 2024 16:35
5db91b3
to
1fb4707
Compare
Inferred return type interface was causing issues; make it explicit.
bassrock
approved these changes
Sep 9, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Infrastructure for corpus search, with ingest and backfill
TODOs:
Out of scope: