-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overwrite image and audio cached assets when necessary #1981
Comments
It depends on the TTL for cached-assets (which is 1 day) so, the files will be refreshed after that time.
|
I think first-rows is fine since it has And yes config-parquet-metadata (which is the job that provides /rows) can take care of removing the outdated cached assets, good idea |
Related to #1823 |
https://huggingface.co/datasets/ccmusic-database/instrument_timbre_eval is now serving the correct files. What I saw is that there is a time period for CloudFront https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html So, if we change/update an S3 file in a lower time interval, for assets and cached-assets, we will have to invalidate the cache or something like this https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Invalidation.html |
If I understand correctly we can fix this issue by including the dataset revision hash in the cached asset path no ? It corresponds to "versioning file names" in the aws docs |
Sound a great idea! I can work on it |
good idea |
Currently we never overwrite image and audio cached assets, which can lead to outdated data in the viewer like here:
https://huggingface.co/datasets/ccmusic-database/instrument_timbre_eval/discussions/1
In particular we have
overwrite=False
here:https://github.com/huggingface/datasets-server/blob/b84c8210ec023664c4780ab348b0232468afe116/libs/libapi/src/libapi/response.py#L36-L42
cc @AndreaFrancis @severo
The text was updated successfully, but these errors were encountered: