-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use placeholder revision in urls in cached responses #2966
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
lhoestq
commented
Jul 3, 2024
Comment on lines
47
to
+49
object_key = storage_client.generate_object_key( | ||
dataset=dataset, | ||
revision=revision, | ||
revision=DATASET_GIT_REVISION_PLACEHOLDER, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main change is here
lhoestq
commented
Jul 3, 2024
Comment on lines
+64
to
+66
return ImageSource( | ||
src=storage_client.get_url(object_key, revision=revision), height=image.height, width=image.width | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few lines later, get_url
applies the right revision if there is an URL preparator associated to the storage client.
In practice:
- /rows, /search and /filter have an URL preparator that inserts the revision and signs the urls from the parquet/duckdb data
- split-first-rows worker has no URL preparator: URLs are stored in the cached as unprepared (i.e. with the revision placeholder and unsigned)
- /first-rows has an URL preparator that inserts the revision and signs the urls that come form the cache
severo
approved these changes
Jul 8, 2024
Co-authored-by: Sylvain Lesage <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously, we were using the dataset revision in the URLs of image/audio files of cached responses of /first-rows.
However when a dataset gets its README updated, we update the
dataset_git_revision
of the cache entries and the location of the image/audio files on S3 but we don't modify the revision in the URLs in the cached response.This resulted in the Viewer not showing the images after modifying the readme of a dataset.
I fixed that for future datasets by not using the revision in the URLs anymore and use a placeholder that is replaced by
dataset_git_revision
when the cached response is accessedImplementation details
I modified the URL Signer logic to also insert the revision in the URL and renamed it to a URL Preparator. It takes care of inserting the revision and signing the URLs.
close #2965