Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding revision for assets creation path #1988

Merged
merged 9 commits into from
Oct 19, 2023
Merged

Conversation

AndreaFrancis
Copy link
Contributor

@AndreaFrancis AndreaFrancis commented Oct 16, 2023

Fix for #1981
As suggested by @lhoestq in #1981 (comment) , adding dataset version (revision) as part of assets path:

Open questions for upcoming PRs:

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 16, 2023

The documentation is not available anymore as the PR was closed or merged.

@codecov-commenter
Copy link

codecov-commenter commented Oct 16, 2023

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (aff2e0b) 90.82% compared to head (dce7c47) 89.71%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1988      +/-   ##
==========================================
- Coverage   90.82%   89.71%   -1.12%     
==========================================
  Files         234      162      -72     
  Lines       14788     9264    -5524     
==========================================
- Hits        13431     8311    -5120     
+ Misses       1357      953     -404     
Flag Coverage Δ
jobs_cache_maintenance 95.32% <ø> (ø)
jobs_mongodb_migration 86.32% <ø> (ø)
libs_libapi 88.59% <ø> (?)
libs_libcommon 92.53% <100.00%> (+0.26%) ⬆️
services_admin 85.88% <ø> (ø)
services_api 86.79% <ø> (ø)
services_rows 85.55% <ø> (+0.62%) ⬆️
services_search 80.73% <ø> (+0.73%) ⬆️
services_sse-api 94.17% <ø> (ø)
services_worker ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
libs/libapi/src/libapi/rows_utils.py 47.36% <ø> (ø)
libs/libapi/src/libapi/utils.py 45.20% <ø> (ø)
libs/libcommon/src/libcommon/config.py 75.63% <ø> (-1.54%) ⬇️
libs/libcommon/src/libcommon/simple_cache.py 90.36% <100.00%> (+0.02%) ⬆️
libs/libcommon/src/libcommon/state.py 97.69% <ø> (ø)
libs/libcommon/src/libcommon/viewer_utils/asset.py 96.29% <100.00%> (+12.45%) ⬆️
...s/libcommon/src/libcommon/viewer_utils/features.py 81.13% <ø> (ø)
libs/libcommon/tests/viewer_utils/test_assets.py 100.00% <100.00%> (ø)
libs/libcommon/tests/viewer_utils/test_features.py 100.00% <ø> (ø)
services/rows/src/rows/routes/rows.py 61.66% <ø> (+5.14%) ⬆️
... and 4 more

... and 92 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@AndreaFrancis AndreaFrancis marked this pull request as ready for review October 16, 2023 20:58
@AndreaFrancis
Copy link
Contributor Author

e2e execution evidence:
Screenshot from 2023-10-16 17-00-16
Screenshot from 2023-10-16 17-00-33

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! and the already existing assets will keep working indeed since we store the full URLs :)

@lhoestq
Copy link
Member

lhoestq commented Oct 17, 2023

Since all our cache records in cachedResponsesBlue collection have a revision value, should we change it to a required and non Optional same as in Jobs?

Yes I think so

For assets, if a new version is processed, we will be storing the previous files in S3, we need to find a way to clean them or, maybe in #1981 it will make more sense since a "versioned" approach was proposed.

The revision for assets is set to "main" so it should overwrite the files every time, but IIUC it will have the same issue with cloudfront still serving the old files ? In this case we should use the right revision (git commit sha) and have a mechanism to clean the outdated directory in the first-rows jobs maybe ? or a cron job, not sure what's the easiest

@AndreaFrancis
Copy link
Contributor Author

Yes I think so

I would like to send another PR before merging this one to ensure that now dataset "version" is mandatory, so that I can remove the optionals here.

The revision for assets is set to "main" so it should overwrite the files every time, but IIUC it will have the same issue with cloudfront still serving the old files ? In this case we should use the right revision (git commit sha) and have a mechanism to clean the outdated directory in the first-rows jobs maybe ? or a cron job, not sure what's the easiest

No, for assets it will apply the revision as well https://github.com/huggingface/datasets-server/pull/1988/files#diff-4072a6f63a7fdcc185cb1170651d53216c3db8db2b5d2374db9e7974d648e2e5R229 so, no problem with CloudFront.
About the mechanism to cleaning outdated directories yes, I think I can propose another PR to clean them periodically until the Versioned DAG approach is implemented (#1823).

@severo
Copy link
Collaborator

severo commented Oct 17, 2023

I would like to send another PR before merging this one to ensure that now dataset "version" is mandatory, so that I can remove the optionals here.

good idea

@github-actions
Copy link

github-actions bot commented Oct 18, 2023

ArgoCD Diff for commit 8006837

Updated at 10/19/2023, 4:17:50 PM CEST

App: datasets-server-prod
YAML generation: Success 🟢
App sync status: Out of Sync ⚠️

===== apps/Deployment datasets-server/prod-datasets-server-admin ======
--- /tmp/argocd-diff1750877366/prod-datasets-server-admin-live.yaml	2023-10-19 14:17:48.892707075 +0000
+++ /tmp/argocd-diff1750877366/prod-datasets-server-admin	2023-10-19 14:17:48.880706936 +0000
@@ -619,14 +619,6 @@
           value: https://datasets-server.huggingface.co/cached-assets
         - name: CACHED_ASSETS_STORAGE_DIRECTORY
           value: /storage/cached-assets
-        - name: CACHED_ASSETS_CLEAN_CACHE_PROBA
-          value: "0.05"
-        - name: CACHED_ASSETS_KEEP_FIRST_ROWS_NUMBER
-          value: "100"
-        - name: CACHED_ASSETS_KEEP_MOST_RECENT_ROWS_NUMBER
-          value: "200"
-        - name: CACHED_ASSETS_MAX_CLEANED_ROWS_NUMBER
-          value: "10000"
         - name: CACHED_ASSETS_S3_FOLDER_NAME
           value: cached-assets
         - name: PARQUET_METADATA_STORAGE_DIRECTORY
@@ -657,7 +649,7 @@
           value: "9"
         - name: ADMIN_UVICORN_PORT
           value: "8080"
-        image: huggingface/datasets-server-services-admin:sha-aff2e0b
+        image: huggingface/datasets-server-services-admin:sha-fe7ceb7
         imagePullPolicy: IfNotPresent
         livenessProbe:
           failureThreshold: 30

===== apps/Deployment datasets-server/prod-datasets-server-api ======
--- /tmp/argocd-diff546326395/prod-datasets-server-api-live.yaml	2023-10-19 14:17:48.916707351 +0000
+++ /tmp/argocd-diff546326395/prod-datasets-server-api	2023-10-19 14:17:48.912707305 +0000
@@ -392,7 +392,7 @@
           value: "9"
         - name: API_UVICORN_PORT
           value: "8080"
-        image: huggingface/datasets-server-services-api:sha-aff2e0b
+        image: huggingface/datasets-server-services-api:sha-fe7ceb7
         imagePullPolicy: IfNotPresent
         livenessProbe:
           failureThreshold: 30

===== apps/Deployment datasets-server/prod-datasets-server-rows ======
--- /tmp/argocd-diff3471894629/prod-datasets-server-rows-live.yaml	2023-10-19 14:17:48.964707904 +0000
+++ /tmp/argocd-diff3471894629/prod-datasets-server-rows	2023-10-19 14:17:48.960707858 +0000
@@ -423,14 +423,6 @@
           value: https://datasets-server.huggingface.co/cached-assets
         - name: CACHED_ASSETS_STORAGE_DIRECTORY
           value: /storage/cached-assets
-        - name: CACHED_ASSETS_CLEAN_CACHE_PROBA
-          value: "0.05"
-        - name: CACHED_ASSETS_KEEP_FIRST_ROWS_NUMBER
-          value: "100"
-        - name: CACHED_ASSETS_KEEP_MOST_RECENT_ROWS_NUMBER
-          value: "200"
-        - name: CACHED_ASSETS_MAX_CLEANED_ROWS_NUMBER
-          value: "10000"
         - name: CACHED_ASSETS_S3_FOLDER_NAME
           value: cached-assets
         - name: S3_BUCKET
@@ -519,7 +511,7 @@
           value: "8080"
         - name: ROWS_INDEX_MAX_ARROW_DATA_IN_MEMORY
           value: "300_000_000"
-        image: huggingface/datasets-server-services-rows:sha-aff2e0b
+        image: huggingface/datasets-server-services-rows:sha-fe7ceb7
         imagePullPolicy: IfNotPresent
         livenessProbe:
           failureThreshold: 30

===== apps/Deployment datasets-server/prod-datasets-server-search ======
--- /tmp/argocd-diff1560725679/prod-datasets-server-search-live.yaml	2023-10-19 14:17:48.988708180 +0000
+++ /tmp/argocd-diff1560725679/prod-datasets-server-search	2023-10-19 14:17:48.984708134 +0000
@@ -418,14 +418,6 @@
           value: https://datasets-server.huggingface.co/cached-assets
         - name: CACHED_ASSETS_STORAGE_DIRECTORY
           value: /storage/cached-assets
-        - name: CACHED_ASSETS_CLEAN_CACHE_PROBA
-          value: "0.05"
-        - name: CACHED_ASSETS_KEEP_FIRST_ROWS_NUMBER
-          value: "100"
-        - name: CACHED_ASSETS_KEEP_MOST_RECENT_ROWS_NUMBER
-          value: "200"
-        - name: CACHED_ASSETS_MAX_CLEANED_ROWS_NUMBER
-          value: "10000"
         - name: CACHED_ASSETS_S3_FOLDER_NAME
           value: cached-assets
         - name: S3_BUCKET
@@ -512,7 +504,7 @@
           value: refs/convert/parquet
         - name: DUCKDB_INDEX_CACHE_DIRECTORY
           value: /storage/duckdb-index
-        image: huggingface/datasets-server-services-search:sha-aff2e0b
+        image: huggingface/datasets-server-services-search:sha-fe7ceb7
         imagePullPolicy: IfNotPresent
         livenessProbe:
           failureThreshold: 30

===== apps/Deployment datasets-server/prod-datasets-server-sse-api ======
--- /tmp/argocd-diff1169530826/prod-datasets-server-sse-api-live.yaml	2023-10-19 14:17:49.020708549 +0000
+++ /tmp/argocd-diff1169530826/prod-datasets-server-sse-api	2023-10-19 14:17:49.016708503 +0000
@@ -316,7 +316,7 @@
           value: "1"
         - name: API_UVICORN_PORT
           value: "8080"
-        image: huggingface/datasets-server-services-sse-api:sha-aff2e0b
+        image: huggingface/datasets-server-services-sse-api:sha-fe7ceb7
         imagePullPolicy: IfNotPresent
         livenessProbe:
           failureThreshold: 30

===== apps/Deployment datasets-server/prod-datasets-server-storage-admin ======
--- /tmp/argocd-diff365301699/prod-datasets-server-storage-admin-live.yaml	2023-10-19 14:17:49.040708779 +0000
+++ /tmp/argocd-diff365301699/prod-datasets-server-storage-admin	2023-10-19 14:17:49.040708779 +0000
@@ -411,7 +411,7 @@
         helm.sh/chart: datasets-server
     spec:
       containers:
-      - image: huggingface/datasets-server-services-storage-admin:sha-aff2e0b
+      - image: huggingface/datasets-server-services-storage-admin:sha-fe7ceb7
         imagePullPolicy: IfNotPresent
         name: prod-datasets-server-storage-admin
         resources:

===== apps/Deployment datasets-server/prod-datasets-server-worker-heavy ======
--- /tmp/argocd-diff436771276/prod-datasets-server-worker-heavy-live.yaml	2023-10-19 14:17:49.084709286 +0000
+++ /tmp/argocd-diff436771276/prod-datasets-server-worker-heavy	2023-10-19 14:17:49.076709194 +0000
@@ -793,7 +793,7 @@
         - name: WORKER_JOB_TYPES_ONLY
         - name: ROWS_INDEX_MAX_ARROW_DATA_IN_MEMORY
           value: "300_000_000"
-        image: huggingface/datasets-server-services-worker:sha-aff2e0b
+        image: huggingface/datasets-server-services-worker:sha-fe7ceb7
         imagePullPolicy: IfNotPresent
         name: prod-datasets-server-worker
         resources:

===== apps/Deployment datasets-server/prod-datasets-server-worker-light ======
--- /tmp/argocd-diff3568297664/prod-datasets-server-worker-light-live.yaml	2023-10-19 14:17:49.124709747 +0000
+++ /tmp/argocd-diff3568297664/prod-datasets-server-worker-light	2023-10-19 14:17:49.120709701 +0000
@@ -793,7 +793,7 @@
         - name: WORKER_JOB_TYPES_ONLY
         - name: ROWS_INDEX_MAX_ARROW_DATA_IN_MEMORY
           value: "300_000_000"
-        image: huggingface/datasets-server-services-worker:sha-aff2e0b
+        image: huggingface/datasets-server-services-worker:sha-fe7ceb7
         imagePullPolicy: IfNotPresent
         name: prod-datasets-server-worker
         resources:

===== apps/Deployment datasets-server/prod-datasets-server-worker-medium ======
--- /tmp/argocd-diff1719315078/prod-datasets-server-worker-medium-live.yaml	2023-10-19 14:17:49.172710300 +0000
+++ /tmp/argocd-diff1719315078/prod-datasets-server-worker-medium	2023-10-19 14:17:49.164710208 +0000
@@ -793,7 +793,7 @@
         - name: WORKER_JOB_TYPES_ONLY
         - name: ROWS_INDEX_MAX_ARROW_DATA_IN_MEMORY
           value: "300_000_000"
-        image: huggingface/datasets-server-services-worker:sha-aff2e0b
+        image: huggingface/datasets-server-services-worker:sha-fe7ceb7
         imagePullPolicy: IfNotPresent
         name: prod-datasets-server-worker
         resources:

===== batch/CronJob datasets-server/prod-datasets-server-job-backfill ======
--- /tmp/argocd-diff3321645691/prod-datasets-server-job-backfill-live.yaml	2023-10-19 14:17:49.184710438 +0000
+++ /tmp/argocd-diff3321645691/prod-datasets-server-job-backfill	2023-10-19 14:17:49.180710392 +0000
@@ -195,7 +195,7 @@
               value: CreateCommitError,LockedDatasetTimeoutError,ExternalServerError
             - name: LOG_LEVEL
               value: debug
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-aff2e0b
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-fe7ceb7
             imagePullPolicy: IfNotPresent
             name: prod-datasets-server-backfill
             resources:

===== batch/CronJob datasets-server/prod-datasets-server-job-cache-metrics-collector ======
--- /tmp/argocd-diff363056783/prod-datasets-server-job-cache-metrics-collector-live.yaml	2023-10-19 14:17:49.196710576 +0000
+++ /tmp/argocd-diff363056783/prod-datasets-server-job-cache-metrics-collector	2023-10-19 14:17:49.192710530 +0000
@@ -187,7 +187,7 @@
                   optional: false
             - name: CACHE_MAINTENANCE_ACTION
               value: collect-cache-metrics
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-aff2e0b
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-fe7ceb7
             imagePullPolicy: IfNotPresent
             name: prod-datasets-server-cache-metrics-collector
             resources:

===== batch/CronJob datasets-server/prod-datasets-server-job-clean-duckdb-downloads ======
--- /tmp/argocd-diff2468858607/prod-datasets-server-job-clean-duckdb-downloads-live.yaml	2023-10-19 14:17:49.208710714 +0000
+++ /tmp/argocd-diff2468858607/prod-datasets-server-job-clean-duckdb-downloads	2023-10-19 14:17:49.208710714 +0000
@@ -246,7 +246,7 @@
               value: downloads/*
             - name: DIRECTORY_CLEANING_EXPIRED_TIME_INTERVAL_SECONDS
               value: "259200"
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-aff2e0b
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-fe7ceb7
             imagePullPolicy: IfNotPresent
             name: prod-datasets-server-clean-duckdb-downloads
             resources:

===== batch/CronJob datasets-server/prod-datasets-server-job-clean-duckdb-job-runner ======
--- /tmp/argocd-diff3357293162/prod-datasets-server-job-clean-duckdb-job-runner-live.yaml	2023-10-19 14:17:49.224710899 +0000
+++ /tmp/argocd-diff3357293162/prod-datasets-server-job-clean-duckdb-job-runner	2023-10-19 14:17:49.220710853 +0000
@@ -246,7 +246,7 @@
               value: job_runner/*
             - name: DIRECTORY_CLEANING_EXPIRED_TIME_INTERVAL_SECONDS
               value: "10800"
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-aff2e0b
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-fe7ceb7
             imagePullPolicy: IfNotPresent
             name: prod-datasets-server-clean-duckdb-job-runner
             resources:

===== batch/CronJob datasets-server/prod-datasets-server-job-clean-hf-datasets-cache ======
--- /tmp/argocd-diff3067479843/prod-datasets-server-job-clean-hf-datasets-cache-live.yaml	2023-10-19 14:17:49.232710991 +0000
+++ /tmp/argocd-diff3067479843/prod-datasets-server-job-clean-hf-datasets-cache	2023-10-19 14:17:49.232710991 +0000
@@ -204,7 +204,7 @@
               value: '*/datasets/*'
             - name: DIRECTORY_CLEANING_EXPIRED_TIME_INTERVAL_SECONDS
               value: "10800"
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-aff2e0b
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-fe7ceb7
             imagePullPolicy: IfNotPresent
             name: prod-datasets-server-clean-hf-datasets-cache
             resources:

===== batch/CronJob datasets-server/prod-datasets-server-job-clean-stats-cache ======
--- /tmp/argocd-diff2783094582/prod-datasets-server-job-clean-stats-cache-live.yaml	2023-10-19 14:17:49.244711129 +0000
+++ /tmp/argocd-diff2783094582/prod-datasets-server-job-clean-stats-cache	2023-10-19 14:17:49.244711129 +0000
@@ -204,7 +204,7 @@
               value: '*'
             - name: DIRECTORY_CLEANING_EXPIRED_TIME_INTERVAL_SECONDS
               value: "10800"
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-aff2e0b
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-fe7ceb7
             imagePullPolicy: IfNotPresent
             name: prod-datasets-server-clean-stats-cache
             resources:

===== batch/CronJob datasets-server/prod-datasets-server-job-post-messages ======
--- /tmp/argocd-diff868031318/prod-datasets-server-job-post-messages-live.yaml	2023-10-19 14:17:49.260711313 +0000
+++ /tmp/argocd-diff868031318/prod-datasets-server-job-post-messages	2023-10-19 14:17:49.260711313 +0000
@@ -211,7 +211,7 @@
               value: post-messages
             - name: LOG_LEVEL
               value: info
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-aff2e0b
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-fe7ceb7
             imagePullPolicy: IfNotPresent
             name: prod-datasets-server-post-messages
             resources:

===== batch/CronJob datasets-server/prod-datasets-server-job-queue-metrics-collector ======
--- /tmp/argocd-diff3218241968/prod-datasets-server-job-queue-metrics-collector-live.yaml	2023-10-19 14:17:49.268711405 +0000
+++ /tmp/argocd-diff3218241968/prod-datasets-server-job-queue-metrics-collector	2023-10-19 14:17:49.268711405 +0000
@@ -188,7 +188,7 @@
                   optional: false
             - name: CACHE_MAINTENANCE_ACTION
               value: collect-queue-metrics
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-aff2e0b
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-fe7ceb7
             imagePullPolicy: IfNotPresent
             name: prod-datasets-server-queue-metrics-collector
             resources:

App: datasets-server-staging
YAML generation: Success 🟢
App sync status: Out of Sync ⚠️

===== apps/Deployment datasets-server/staging-datasets-server-admin ======
--- /tmp/argocd-diff376452422/staging-datasets-server-admin-live.yaml	2023-10-19 14:17:50.136721403 +0000
+++ /tmp/argocd-diff376452422/staging-datasets-server-admin	2023-10-19 14:17:50.132721357 +0000
@@ -601,14 +601,6 @@
           value: https://datasets-server.us.dev.moon.huggingface.tech/cached-assets
         - name: CACHED_ASSETS_STORAGE_DIRECTORY
           value: /storage/cached-assets
-        - name: CACHED_ASSETS_CLEAN_CACHE_PROBA
-          value: "0.05"
-        - name: CACHED_ASSETS_KEEP_FIRST_ROWS_NUMBER
-          value: "100"
-        - name: CACHED_ASSETS_KEEP_MOST_RECENT_ROWS_NUMBER
-          value: "200"
-        - name: CACHED_ASSETS_MAX_CLEANED_ROWS_NUMBER
-          value: "10000"
         - name: CACHED_ASSETS_S3_FOLDER_NAME
           value: cached-assets
         - name: PARQUET_METADATA_STORAGE_DIRECTORY
@@ -639,7 +631,7 @@
           value: "1"
         - name: ADMIN_UVICORN_PORT
           value: "8080"
-        image: huggingface/datasets-server-services-admin:sha-aff2e0b
+        image: huggingface/datasets-server-services-admin:sha-fe7ceb7
         imagePullPolicy: IfNotPresent
         livenessProbe:
           failureThreshold: 30

===== apps/Deployment datasets-server/staging-datasets-server-api ======
--- /tmp/argocd-diff678568471/staging-datasets-server-api-live.yaml	2023-10-19 14:17:50.156721634 +0000
+++ /tmp/argocd-diff678568471/staging-datasets-server-api	2023-10-19 14:17:50.152721587 +0000
@@ -357,7 +357,7 @@
           value: "1"
         - name: API_UVICORN_PORT
           value: "8080"
-        image: huggingface/datasets-server-services-api:sha-aff2e0b
+        image: huggingface/datasets-server-services-api:sha-fe7ceb7
         imagePullPolicy: IfNotPresent
         livenessProbe:
           failureThreshold: 30

===== apps/Deployment datasets-server/staging-datasets-server-rows ======
--- /tmp/argocd-diff1516654416/staging-datasets-server-rows-live.yaml	2023-10-19 14:17:50.192722048 +0000
+++ /tmp/argocd-diff1516654416/staging-datasets-server-rows	2023-10-19 14:17:50.188722002 +0000
@@ -433,14 +433,6 @@
           value: https://datasets-server.us.dev.moon.huggingface.tech/cached-assets
         - name: CACHED_ASSETS_STORAGE_DIRECTORY
           value: /storage/cached-assets
-        - name: CACHED_ASSETS_CLEAN_CACHE_PROBA
-          value: "0.05"
-        - name: CACHED_ASSETS_KEEP_FIRST_ROWS_NUMBER
-          value: "100"
-        - name: CACHED_ASSETS_KEEP_MOST_RECENT_ROWS_NUMBER
-          value: "200"
-        - name: CACHED_ASSETS_MAX_CLEANED_ROWS_NUMBER
-          value: "10000"
         - name: CACHED_ASSETS_S3_FOLDER_NAME
           value: cached-assets
         - name: S3_BUCKET
@@ -524,7 +516,7 @@
           value: "8080"
         - name: ROWS_INDEX_MAX_ARROW_DATA_IN_MEMORY
           value: "300_000_000"
-        image: huggingface/datasets-server-services-rows:sha-aff2e0b
+        image: huggingface/datasets-server-services-rows:sha-fe7ceb7
         imagePullPolicy: IfNotPresent
         livenessProbe:
           failureThreshold: 30

===== apps/Deployment datasets-server/staging-datasets-server-search ======
--- /tmp/argocd-diff2523284086/staging-datasets-server-search-live.yaml	2023-10-19 14:17:50.224722417 +0000
+++ /tmp/argocd-diff2523284086/staging-datasets-server-search	2023-10-19 14:17:50.220722371 +0000
@@ -428,14 +428,6 @@
           value: https://datasets-server.us.dev.moon.huggingface.tech/cached-assets
         - name: CACHED_ASSETS_STORAGE_DIRECTORY
           value: /storage/cached-assets
-        - name: CACHED_ASSETS_CLEAN_CACHE_PROBA
-          value: "0.05"
-        - name: CACHED_ASSETS_KEEP_FIRST_ROWS_NUMBER
-          value: "100"
-        - name: CACHED_ASSETS_KEEP_MOST_RECENT_ROWS_NUMBER
-          value: "200"
-        - name: CACHED_ASSETS_MAX_CLEANED_ROWS_NUMBER
-          value: "10000"
         - name: CACHED_ASSETS_S3_FOLDER_NAME
           value: cached-assets
         - name: S3_BUCKET
@@ -517,7 +509,7 @@
           value: refs/convert/parquet
         - name: DUCKDB_INDEX_CACHE_DIRECTORY
           value: /storage/duckdb-index
-        image: huggingface/datasets-server-services-search:sha-aff2e0b
+        image: huggingface/datasets-server-services-search:sha-fe7ceb7
         imagePullPolicy: IfNotPresent
         livenessProbe:
           failureThreshold: 30

===== apps/Deployment datasets-server/staging-datasets-server-sse-api ======
--- /tmp/argocd-diff3798542608/staging-datasets-server-sse-api-live.yaml	2023-10-19 14:17:50.240722601 +0000
+++ /tmp/argocd-diff3798542608/staging-datasets-server-sse-api	2023-10-19 14:17:50.240722601 +0000
@@ -319,7 +319,7 @@
           value: "1"
         - name: API_UVICORN_PORT
           value: "8080"
-        image: huggingface/datasets-server-services-sse-api:sha-aff2e0b
+        image: huggingface/datasets-server-services-sse-api:sha-fe7ceb7
         imagePullPolicy: IfNotPresent
         livenessProbe:
           failureThreshold: 30

===== apps/Deployment datasets-server/staging-datasets-server-storage-admin ======
--- /tmp/argocd-diff511063805/staging-datasets-server-storage-admin-live.yaml	2023-10-19 14:17:50.264722878 +0000
+++ /tmp/argocd-diff511063805/staging-datasets-server-storage-admin	2023-10-19 14:17:50.260722831 +0000
@@ -406,7 +406,7 @@
         helm.sh/chart: datasets-server
     spec:
       containers:
-      - image: huggingface/datasets-server-services-storage-admin:sha-aff2e0b
+      - image: huggingface/datasets-server-services-storage-admin:sha-fe7ceb7
         imagePullPolicy: IfNotPresent
         name: staging-datasets-server-storage-admin
         resources:

===== apps/Deployment datasets-server/staging-datasets-server-worker-all ======
--- /tmp/argocd-diff2162938880/staging-datasets-server-worker-all-live.yaml	2023-10-19 14:17:50.304723338 +0000
+++ /tmp/argocd-diff2162938880/staging-datasets-server-worker-all	2023-10-19 14:17:50.300723292 +0000
@@ -804,7 +804,7 @@
         - name: WORKER_JOB_TYPES_ONLY
         - name: ROWS_INDEX_MAX_ARROW_DATA_IN_MEMORY
           value: "300_000_000"
-        image: huggingface/datasets-server-services-worker:sha-aff2e0b
+        image: huggingface/datasets-server-services-worker:sha-fe7ceb7
         imagePullPolicy: IfNotPresent
         name: staging-datasets-server-worker
         resources:

===== apps/Deployment datasets-server/staging-datasets-server-worker-light ======
--- /tmp/argocd-diff461297048/staging-datasets-server-worker-light-live.yaml	2023-10-19 14:17:50.344723799 +0000
+++ /tmp/argocd-diff461297048/staging-datasets-server-worker-light	2023-10-19 14:17:50.340723753 +0000
@@ -804,7 +804,7 @@
         - name: WORKER_JOB_TYPES_ONLY
         - name: ROWS_INDEX_MAX_ARROW_DATA_IN_MEMORY
           value: "300_000_000"
-        image: huggingface/datasets-server-services-worker:sha-aff2e0b
+        image: huggingface/datasets-server-services-worker:sha-fe7ceb7
         imagePullPolicy: IfNotPresent
         name: staging-datasets-server-worker
         resources:

===== batch/CronJob datasets-server/staging-datasets-server-job-cache-metrics-collector ======
--- /tmp/argocd-diff533853049/staging-datasets-server-job-cache-metrics-collector-live.yaml	2023-10-19 14:17:50.360723983 +0000
+++ /tmp/argocd-diff533853049/staging-datasets-server-job-cache-metrics-collector	2023-10-19 14:17:50.360723983 +0000
@@ -186,7 +186,7 @@
                   optional: false
             - name: CACHE_MAINTENANCE_ACTION
               value: collect-cache-metrics
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-aff2e0b
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-fe7ceb7
             imagePullPolicy: IfNotPresent
             name: staging-datasets-server-cache-metrics-collector
             resources:

===== batch/CronJob datasets-server/staging-datasets-server-job-clean-duckdb-downloads ======
--- /tmp/argocd-diff470449017/staging-datasets-server-job-clean-duckdb-downloads-live.yaml	2023-10-19 14:17:50.376724168 +0000
+++ /tmp/argocd-diff470449017/staging-datasets-server-job-clean-duckdb-downloads	2023-10-19 14:17:50.372724121 +0000
@@ -245,7 +245,7 @@
               value: downloads/*
             - name: DIRECTORY_CLEANING_EXPIRED_TIME_INTERVAL_SECONDS
               value: "600"
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-aff2e0b
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-fe7ceb7
             imagePullPolicy: IfNotPresent
             name: staging-datasets-server-clean-duckdb-downloads
             resources:

===== batch/CronJob datasets-server/staging-datasets-server-job-clean-duckdb-job-runner ======
--- /tmp/argocd-diff3150690976/staging-datasets-server-job-clean-duckdb-job-runner-live.yaml	2023-10-19 14:17:50.388724306 +0000
+++ /tmp/argocd-diff3150690976/staging-datasets-server-job-clean-duckdb-job-runner	2023-10-19 14:17:50.384724260 +0000
@@ -245,7 +245,7 @@
               value: job_runner/*
             - name: DIRECTORY_CLEANING_EXPIRED_TIME_INTERVAL_SECONDS
               value: "600"
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-aff2e0b
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-fe7ceb7
             imagePullPolicy: IfNotPresent
             name: staging-datasets-server-clean-duckdb-job-runner
             resources:

===== batch/CronJob datasets-server/staging-datasets-server-job-clean-hf-datasets-cache ======
--- /tmp/argocd-diff2870218572/staging-datasets-server-job-clean-hf-datasets-cache-live.yaml	2023-10-19 14:17:50.396724398 +0000
+++ /tmp/argocd-diff2870218572/staging-datasets-server-job-clean-hf-datasets-cache	2023-10-19 14:17:50.396724398 +0000
@@ -203,7 +203,7 @@
               value: '*/datasets/*'
             - name: DIRECTORY_CLEANING_EXPIRED_TIME_INTERVAL_SECONDS
               value: "600"
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-aff2e0b
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-fe7ceb7
             imagePullPolicy: IfNotPresent
             name: staging-datasets-server-clean-hf-datasets-cache
             resources:

===== batch/CronJob datasets-server/staging-datasets-server-job-clean-stats-cache ======
--- /tmp/argocd-diff1068898501/staging-datasets-server-job-clean-stats-cache-live.yaml	2023-10-19 14:17:50.408724536 +0000
+++ /tmp/argocd-diff1068898501/staging-datasets-server-job-clean-stats-cache	2023-10-19 14:17:50.408724536 +0000
@@ -203,7 +203,7 @@
               value: '*'
             - name: DIRECTORY_CLEANING_EXPIRED_TIME_INTERVAL_SECONDS
               value: "600"
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-aff2e0b
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-fe7ceb7
             imagePullPolicy: IfNotPresent
             name: staging-datasets-server-clean-stats-cache
             resources:

===== batch/CronJob datasets-server/staging-datasets-server-job-post-messages ======
--- /tmp/argocd-diff81357667/staging-datasets-server-job-post-messages-live.yaml	2023-10-19 14:17:50.424724720 +0000
+++ /tmp/argocd-diff81357667/staging-datasets-server-job-post-messages	2023-10-19 14:17:50.424724720 +0000
@@ -210,7 +210,7 @@
               value: post-messages
             - name: LOG_LEVEL
               value: info
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-aff2e0b
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-fe7ceb7
             imagePullPolicy: IfNotPresent
             name: staging-datasets-server-post-messages
             resources:

===== batch/CronJob datasets-server/staging-datasets-server-job-queue-metrics-collector ======
--- /tmp/argocd-diff2540787926/staging-datasets-server-job-queue-metrics-collector-live.yaml	2023-10-19 14:17:50.432724813 +0000
+++ /tmp/argocd-diff2540787926/staging-datasets-server-job-queue-metrics-collector	2023-10-19 14:17:50.432724813 +0000
@@ -187,7 +187,7 @@
                   optional: false
             - name: CACHE_MAINTENANCE_ACTION
               value: collect-queue-metrics
-            image: huggingface/datasets-server-jobs-cache_maintenance:sha-aff2e0b
+            image: huggingface/datasets-server-jobs-cache_maintenance:sha-fe7ceb7
             imagePullPolicy: IfNotPresent
             name: staging-datasets-server-queue-metrics-collector
             resources:

Legend Status
The app is synced in ArgoCD, and diffs you see are solely from this PR.
⚠️ The app is out-of-sync in ArgoCD, and the diffs you see include those changes plus any from this PR.
🛑 There was an error generating the ArgoCD diffs due to changes in this PR.

docs/source/openapi.json Outdated Show resolved Hide resolved
libs/libcommon/src/libcommon/viewer_utils/asset.py Outdated Show resolved Hide resolved
services/rows/src/rows/routes/rows.py Outdated Show resolved Hide resolved
@AndreaFrancis AndreaFrancis merged commit 2a16448 into main Oct 19, 2023
26 checks passed
@AndreaFrancis AndreaFrancis deleted the versioned-assets branch October 19, 2023 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants