Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable dataset scripts #2001

Merged
merged 12 commits into from
Oct 19, 2023
Merged

Disable dataset scripts #2001

merged 12 commits into from
Oct 19, 2023

Conversation

lhoestq
Copy link
Member

@lhoestq lhoestq commented Oct 19, 2023

The config-parquet-and-info step will now raise a DatasetWithScriptNotSupportedError for datasets with a script, except those in the allow list. This will prevent users from running arbitrary code using dataset scripts.

The error message is shown to the user on the website and it says

            raise DatasetWithScriptNotSupportedError(
                "The dataset viewer doesn't support this dataset because it runs "
                "arbitrary python code. Please open a discussion in the discussion tab "
                "if you think this is an error and tag @lhoestq and @severo."
            )

The allow list is hardcoded for now: DATASET_SCRIPTS_ALLOW_LIST = ["canonical"]
The keyword "canonical" means all the datasets without namespaces.

We can add other datasets to the allow list, and it supports fnmatch, for example to support all the datasets from huggingface we can add huggingface/* to the allow list.

cc @severo @XciD

@codecov-commenter
Copy link

codecov-commenter commented Oct 19, 2023

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (9443125) 86.51% compared to head (3232c29) 90.27%.
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2001      +/-   ##
==========================================
+ Coverage   86.51%   90.27%   +3.76%     
==========================================
  Files          66      143      +77     
  Lines        3455     8195    +4740     
==========================================
+ Hits         2989     7398    +4409     
- Misses        466      797     +331     
Flag Coverage Δ
jobs_cache_maintenance 95.32% <ø> (?)
jobs_mongodb_migration 86.32% <ø> (?)
libs_libapi ?
libs_libcommon 92.52% <50.00%> (?)
services_admin 85.88% <ø> (ø)
services_api 86.79% <ø> (ø)
services_rows 85.55% <ø> (+0.62%) ⬆️
services_search ?
services_sse-api 94.17% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
libs/libcommon/src/libcommon/exceptions.py 70.70% <66.66%> (ø)
libs/libcommon/src/libcommon/config.py 76.03% <33.33%> (ø)

... and 117 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lhoestq lhoestq marked this pull request as ready for review October 19, 2023 15:16
@lhoestq lhoestq requested a review from severo October 19, 2023 15:16
Copy link
Collaborator

@severo severo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to raise the exception in every job that uses the datasets library to run the user script

@@ -88,6 +89,8 @@
MAX_FILES_PER_DIRECTORY = 10_000 # hf hub limitation
MAX_OPERATIONS_PER_COMMIT = 500

DATASET_SCRIPTS_ALLOW_LIST = ["canonical"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we will surely apply the restriction to other job runners, move it to a shared file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used an environment variable that is used in the common config for workers

dynamic_modules_path: Optional[str] = None,
) -> None:
for allowed_pattern in allow_list:
if (allowed_pattern == "canonical" and "/" not in name) or fnmatch(name, allowed_pattern):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a more specific keyword instead of "canonical", as this org exists? (https://huggingface.co/Canonical)

even if it's handled by your code, it would feel less bug-prone.

You can use a forbidden character for example (see https://github.com/huggingface/moon-landing/blob/main/server/lib/Names.ts - internal)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed {{ALL_DATASETS_WITH_NO_NAMESPACE}}

@github-actions
Copy link

github-actions bot commented Oct 19, 2023

ArgoCD Diff for commit dd5988a

Updated at 10/19/2023, 6:10:27 PM CEST

App: datasets-server-prod
YAML generation: Success 🟢
App sync status: Synced ✅

===== apps/Deployment datasets-server/prod-datasets-server-admin ======
--- /tmp/argocd-diff3212010598/prod-datasets-server-admin-live.yaml	2023-10-19 16:10:26.462240991 +0000
+++ /tmp/argocd-diff3212010598/prod-datasets-server-admin	2023-10-19 16:10:26.458240928 +0000
@@ -582,6 +582,8 @@
               optional: false
         - name: COMMON_BLOCKED_DATASETS
           value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+        - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+          value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
         - name: COMMON_HF_ENDPOINT
           value: https://huggingface.co
         - name: HF_ENDPOINT

===== apps/Deployment datasets-server/prod-datasets-server-api ======
--- /tmp/argocd-diff1381546059/prod-datasets-server-api-live.yaml	2023-10-19 16:10:26.494241496 +0000
+++ /tmp/argocd-diff1381546059/prod-datasets-server-api	2023-10-19 16:10:26.490241433 +0000
@@ -347,6 +347,8 @@
               optional: false
         - name: COMMON_BLOCKED_DATASETS
           value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+        - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+          value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
         - name: COMMON_HF_ENDPOINT
           value: https://huggingface.co
         - name: HF_ENDPOINT

===== apps/Deployment datasets-server/prod-datasets-server-rows ======
--- /tmp/argocd-diff418530739/prod-datasets-server-rows-live.yaml	2023-10-19 16:10:26.534242126 +0000
+++ /tmp/argocd-diff418530739/prod-datasets-server-rows	2023-10-19 16:10:26.530242063 +0000
@@ -448,6 +448,8 @@
               optional: false
         - name: COMMON_BLOCKED_DATASETS
           value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+        - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+          value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
         - name: COMMON_HF_ENDPOINT
           value: https://huggingface.co
         - name: HF_ENDPOINT

===== apps/Deployment datasets-server/prod-datasets-server-search ======
--- /tmp/argocd-diff1212569074/prod-datasets-server-search-live.yaml	2023-10-19 16:10:26.554242441 +0000
+++ /tmp/argocd-diff1212569074/prod-datasets-server-search	2023-10-19 16:10:26.554242441 +0000
@@ -441,6 +441,8 @@
               optional: false
         - name: COMMON_BLOCKED_DATASETS
           value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+        - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+          value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
         - name: COMMON_HF_ENDPOINT
           value: https://huggingface.co
         - name: HF_ENDPOINT

===== apps/Deployment datasets-server/prod-datasets-server-sse-api ======
--- /tmp/argocd-diff2963792036/prod-datasets-server-sse-api-live.yaml	2023-10-19 16:10:26.574242756 +0000
+++ /tmp/argocd-diff2963792036/prod-datasets-server-sse-api	2023-10-19 16:10:26.570242693 +0000
@@ -277,6 +277,8 @@
               optional: false
         - name: COMMON_BLOCKED_DATASETS
           value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+        - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+          value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
         - name: COMMON_HF_ENDPOINT
           value: https://huggingface.co
         - name: HF_ENDPOINT

===== apps/Deployment datasets-server/prod-datasets-server-worker-heavy ======
--- /tmp/argocd-diff4154322428/prod-datasets-server-worker-heavy-live.yaml	2023-10-19 16:10:26.630243639 +0000
+++ /tmp/argocd-diff4154322428/prod-datasets-server-worker-heavy	2023-10-19 16:10:26.618243450 +0000
@@ -653,6 +653,8 @@
               optional: false
         - name: COMMON_BLOCKED_DATASETS
           value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+        - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+          value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
         - name: COMMON_HF_ENDPOINT
           value: https://huggingface.co
         - name: HF_ENDPOINT

===== apps/Deployment datasets-server/prod-datasets-server-worker-light ======
--- /tmp/argocd-diff3893193270/prod-datasets-server-worker-light-live.yaml	2023-10-19 16:10:26.670244269 +0000
+++ /tmp/argocd-diff3893193270/prod-datasets-server-worker-light	2023-10-19 16:10:26.662244143 +0000
@@ -656,6 +656,8 @@
               optional: false
         - name: COMMON_BLOCKED_DATASETS
           value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+        - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+          value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
         - name: COMMON_HF_ENDPOINT
           value: https://huggingface.co
         - name: HF_ENDPOINT

===== apps/Deployment datasets-server/prod-datasets-server-worker-medium ======
--- /tmp/argocd-diff893394005/prod-datasets-server-worker-medium-live.yaml	2023-10-19 16:10:26.710244899 +0000
+++ /tmp/argocd-diff893394005/prod-datasets-server-worker-medium	2023-10-19 16:10:26.702244773 +0000
@@ -653,6 +653,8 @@
               optional: false
         - name: COMMON_BLOCKED_DATASETS
           value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+        - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+          value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
         - name: COMMON_HF_ENDPOINT
           value: https://huggingface.co
         - name: HF_ENDPOINT

===== batch/CronJob datasets-server/prod-datasets-server-job-backfill ======
--- /tmp/argocd-diff1852994792/prod-datasets-server-job-backfill-live.yaml	2023-10-19 16:10:26.726245151 +0000
+++ /tmp/argocd-diff1852994792/prod-datasets-server-job-backfill	2023-10-19 16:10:26.726245151 +0000
@@ -179,6 +179,8 @@
                   optional: false
             - name: COMMON_BLOCKED_DATASETS
               value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+            - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+              value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
             - name: COMMON_HF_ENDPOINT
               value: https://huggingface.co
             - name: HF_ENDPOINT

===== batch/CronJob datasets-server/prod-datasets-server-job-cache-metrics-collector ======
--- /tmp/argocd-diff2512409610/prod-datasets-server-job-cache-metrics-collector-live.yaml	2023-10-19 16:10:26.734245278 +0000
+++ /tmp/argocd-diff2512409610/prod-datasets-server-job-cache-metrics-collector	2023-10-19 16:10:26.734245278 +0000
@@ -175,6 +175,8 @@
                   optional: false
             - name: COMMON_BLOCKED_DATASETS
               value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+            - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+              value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
             - name: COMMON_HF_ENDPOINT
               value: https://huggingface.co
             - name: HF_ENDPOINT

===== batch/CronJob datasets-server/prod-datasets-server-job-clean-duckdb-downloads ======
--- /tmp/argocd-diff640742043/prod-datasets-server-job-clean-duckdb-downloads-live.yaml	2023-10-19 16:10:26.750245530 +0000
+++ /tmp/argocd-diff640742043/prod-datasets-server-job-clean-duckdb-downloads	2023-10-19 16:10:26.746245467 +0000
@@ -226,6 +226,8 @@
                   optional: false
             - name: COMMON_BLOCKED_DATASETS
               value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+            - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+              value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
             - name: COMMON_HF_ENDPOINT
               value: https://huggingface.co
             - name: HF_ENDPOINT

===== batch/CronJob datasets-server/prod-datasets-server-job-clean-duckdb-job-runner ======
--- /tmp/argocd-diff3651768659/prod-datasets-server-job-clean-duckdb-job-runner-live.yaml	2023-10-19 16:10:26.762245719 +0000
+++ /tmp/argocd-diff3651768659/prod-datasets-server-job-clean-duckdb-job-runner	2023-10-19 16:10:26.758245656 +0000
@@ -226,6 +226,8 @@
                   optional: false
             - name: COMMON_BLOCKED_DATASETS
               value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+            - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+              value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
             - name: COMMON_HF_ENDPOINT
               value: https://huggingface.co
             - name: HF_ENDPOINT

===== batch/CronJob datasets-server/prod-datasets-server-job-clean-hf-datasets-cache ======
--- /tmp/argocd-diff3068723558/prod-datasets-server-job-clean-hf-datasets-cache-live.yaml	2023-10-19 16:10:26.774245908 +0000
+++ /tmp/argocd-diff3068723558/prod-datasets-server-job-clean-hf-datasets-cache	2023-10-19 16:10:26.770245845 +0000
@@ -184,6 +184,8 @@
           - env:
             - name: COMMON_BLOCKED_DATASETS
               value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+            - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+              value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
             - name: COMMON_HF_ENDPOINT
               value: https://huggingface.co
             - name: HF_ENDPOINT

===== batch/CronJob datasets-server/prod-datasets-server-job-clean-stats-cache ======
--- /tmp/argocd-diff3375232980/prod-datasets-server-job-clean-stats-cache-live.yaml	2023-10-19 16:10:26.790246160 +0000
+++ /tmp/argocd-diff3375232980/prod-datasets-server-job-clean-stats-cache	2023-10-19 16:10:26.786246094 +0000
@@ -184,6 +184,8 @@
           - env:
             - name: COMMON_BLOCKED_DATASETS
               value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+            - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+              value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
             - name: COMMON_HF_ENDPOINT
               value: https://huggingface.co
             - name: HF_ENDPOINT

===== batch/CronJob datasets-server/prod-datasets-server-job-post-messages ======
--- /tmp/argocd-diff78112089/prod-datasets-server-job-post-messages-live.yaml	2023-10-19 16:10:26.798246292 +0000
+++ /tmp/argocd-diff78112089/prod-datasets-server-job-post-messages	2023-10-19 16:10:26.798246292 +0000
@@ -187,6 +187,8 @@
                   optional: false
             - name: COMMON_BLOCKED_DATASETS
               value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+            - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+              value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
             - name: COMMON_HF_ENDPOINT
               value: https://huggingface.co
             - name: HF_ENDPOINT

===== batch/CronJob datasets-server/prod-datasets-server-job-queue-metrics-collector ======
--- /tmp/argocd-diff3854248629/prod-datasets-server-job-queue-metrics-collector-live.yaml	2023-10-19 16:10:26.810246490 +0000
+++ /tmp/argocd-diff3854248629/prod-datasets-server-job-queue-metrics-collector	2023-10-19 16:10:26.810246490 +0000
@@ -177,6 +177,8 @@
                   optional: false
             - name: COMMON_BLOCKED_DATASETS
               value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+            - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+              value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
             - name: COMMON_HF_ENDPOINT
               value: https://huggingface.co
             - name: HF_ENDPOINT

App: datasets-server-staging
YAML generation: Success 🟢
App sync status: Synced ✅

===== apps/Deployment datasets-server/staging-datasets-server-admin ======
--- /tmp/argocd-diff2830271031/staging-datasets-server-admin-live.yaml	2023-10-19 16:10:27.422256592 +0000
+++ /tmp/argocd-diff2830271031/staging-datasets-server-admin	2023-10-19 16:10:27.414256460 +0000
@@ -563,6 +563,8 @@
               optional: false
         - name: COMMON_BLOCKED_DATASETS
           value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+        - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+          value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
         - name: COMMON_HF_ENDPOINT
           value: https://huggingface.co
         - name: HF_ENDPOINT

===== apps/Deployment datasets-server/staging-datasets-server-api ======
--- /tmp/argocd-diff547647550/staging-datasets-server-api-live.yaml	2023-10-19 16:10:27.438256856 +0000
+++ /tmp/argocd-diff547647550/staging-datasets-server-api	2023-10-19 16:10:27.438256856 +0000
@@ -316,6 +316,8 @@
               optional: false
         - name: COMMON_BLOCKED_DATASETS
           value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+        - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+          value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
         - name: COMMON_HF_ENDPOINT
           value: https://huggingface.co
         - name: HF_ENDPOINT

===== apps/Deployment datasets-server/staging-datasets-server-rows ======
--- /tmp/argocd-diff3900340283/staging-datasets-server-rows-live.yaml	2023-10-19 16:10:27.474257450 +0000
+++ /tmp/argocd-diff3900340283/staging-datasets-server-rows	2023-10-19 16:10:27.470257384 +0000
@@ -457,6 +457,8 @@
               optional: false
         - name: COMMON_BLOCKED_DATASETS
           value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+        - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+          value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
         - name: COMMON_HF_ENDPOINT
           value: https://huggingface.co
         - name: HF_ENDPOINT

===== apps/Deployment datasets-server/staging-datasets-server-search ======
--- /tmp/argocd-diff1975720603/staging-datasets-server-search-live.yaml	2023-10-19 16:10:27.498257846 +0000
+++ /tmp/argocd-diff1975720603/staging-datasets-server-search	2023-10-19 16:10:27.494257780 +0000
@@ -450,6 +450,8 @@
               optional: false
         - name: COMMON_BLOCKED_DATASETS
           value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+        - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+          value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
         - name: COMMON_HF_ENDPOINT
           value: https://huggingface.co
         - name: HF_ENDPOINT

===== apps/Deployment datasets-server/staging-datasets-server-sse-api ======
--- /tmp/argocd-diff3911189508/staging-datasets-server-sse-api-live.yaml	2023-10-19 16:10:27.514258111 +0000
+++ /tmp/argocd-diff3911189508/staging-datasets-server-sse-api	2023-10-19 16:10:27.510258045 +0000
@@ -284,6 +284,8 @@
               optional: false
         - name: COMMON_BLOCKED_DATASETS
           value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+        - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+          value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
         - name: COMMON_HF_ENDPOINT
           value: https://huggingface.co
         - name: HF_ENDPOINT

===== apps/Deployment datasets-server/staging-datasets-server-worker-all ======
--- /tmp/argocd-diff3167191088/staging-datasets-server-worker-all-live.yaml	2023-10-19 16:10:27.570259035 +0000
+++ /tmp/argocd-diff3167191088/staging-datasets-server-worker-all	2023-10-19 16:10:27.562258903 +0000
@@ -669,6 +669,8 @@
               optional: false
         - name: COMMON_BLOCKED_DATASETS
           value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+        - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+          value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
         - name: COMMON_HF_ENDPOINT
           value: https://huggingface.co
         - name: HF_ENDPOINT

===== apps/Deployment datasets-server/staging-datasets-server-worker-light ======
--- /tmp/argocd-diff3075085335/staging-datasets-server-worker-light-live.yaml	2023-10-19 16:10:27.610259695 +0000
+++ /tmp/argocd-diff3075085335/staging-datasets-server-worker-light	2023-10-19 16:10:27.602259563 +0000
@@ -669,6 +669,8 @@
               optional: false
         - name: COMMON_BLOCKED_DATASETS
           value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+        - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+          value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
         - name: COMMON_HF_ENDPOINT
           value: https://huggingface.co
         - name: HF_ENDPOINT

===== batch/CronJob datasets-server/staging-datasets-server-job-cache-metrics-collector ======
--- /tmp/argocd-diff3525433922/staging-datasets-server-job-cache-metrics-collector-live.yaml	2023-10-19 16:10:27.626259959 +0000
+++ /tmp/argocd-diff3525433922/staging-datasets-server-job-cache-metrics-collector	2023-10-19 16:10:27.622259893 +0000
@@ -174,6 +174,8 @@
                   optional: false
             - name: COMMON_BLOCKED_DATASETS
               value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+            - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+              value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
             - name: COMMON_HF_ENDPOINT
               value: https://huggingface.co
             - name: HF_ENDPOINT

===== batch/CronJob datasets-server/staging-datasets-server-job-clean-duckdb-downloads ======
--- /tmp/argocd-diff1837666755/staging-datasets-server-job-clean-duckdb-downloads-live.yaml	2023-10-19 16:10:27.638260157 +0000
+++ /tmp/argocd-diff1837666755/staging-datasets-server-job-clean-duckdb-downloads	2023-10-19 16:10:27.634260091 +0000
@@ -225,6 +225,8 @@
                   optional: false
             - name: COMMON_BLOCKED_DATASETS
               value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+            - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+              value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
             - name: COMMON_HF_ENDPOINT
               value: https://huggingface.co
             - name: HF_ENDPOINT

===== batch/CronJob datasets-server/staging-datasets-server-job-clean-duckdb-job-runner ======
--- /tmp/argocd-diff2974282701/staging-datasets-server-job-clean-duckdb-job-runner-live.yaml	2023-10-19 16:10:27.650260355 +0000
+++ /tmp/argocd-diff2974282701/staging-datasets-server-job-clean-duckdb-job-runner	2023-10-19 16:10:27.646260289 +0000
@@ -225,6 +225,8 @@
                   optional: false
             - name: COMMON_BLOCKED_DATASETS
               value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+            - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+              value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
             - name: COMMON_HF_ENDPOINT
               value: https://huggingface.co
             - name: HF_ENDPOINT

===== batch/CronJob datasets-server/staging-datasets-server-job-clean-hf-datasets-cache ======
--- /tmp/argocd-diff353808233/staging-datasets-server-job-clean-hf-datasets-cache-live.yaml	2023-10-19 16:10:27.662260554 +0000
+++ /tmp/argocd-diff353808233/staging-datasets-server-job-clean-hf-datasets-cache	2023-10-19 16:10:27.658260487 +0000
@@ -183,6 +183,8 @@
           - env:
             - name: COMMON_BLOCKED_DATASETS
               value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+            - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+              value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
             - name: COMMON_HF_ENDPOINT
               value: https://huggingface.co
             - name: HF_ENDPOINT

===== batch/CronJob datasets-server/staging-datasets-server-job-clean-stats-cache ======
--- /tmp/argocd-diff2315264591/staging-datasets-server-job-clean-stats-cache-live.yaml	2023-10-19 16:10:27.674260752 +0000
+++ /tmp/argocd-diff2315264591/staging-datasets-server-job-clean-stats-cache	2023-10-19 16:10:27.674260752 +0000
@@ -183,6 +183,8 @@
           - env:
             - name: COMMON_BLOCKED_DATASETS
               value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+            - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+              value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
             - name: COMMON_HF_ENDPOINT
               value: https://huggingface.co
             - name: HF_ENDPOINT

===== batch/CronJob datasets-server/staging-datasets-server-job-post-messages ======
--- /tmp/argocd-diff140318677/staging-datasets-server-job-post-messages-live.yaml	2023-10-19 16:10:27.686260950 +0000
+++ /tmp/argocd-diff140318677/staging-datasets-server-job-post-messages	2023-10-19 16:10:27.682260884 +0000
@@ -186,6 +186,8 @@
                   optional: false
             - name: COMMON_BLOCKED_DATASETS
               value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+            - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+              value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
             - name: COMMON_HF_ENDPOINT
               value: https://huggingface.co
             - name: HF_ENDPOINT

===== batch/CronJob datasets-server/staging-datasets-server-job-queue-metrics-collector ======
--- /tmp/argocd-diff979756918/staging-datasets-server-job-queue-metrics-collector-live.yaml	2023-10-19 16:10:27.694261082 +0000
+++ /tmp/argocd-diff979756918/staging-datasets-server-job-queue-metrics-collector	2023-10-19 16:10:27.694261082 +0000
@@ -175,6 +175,8 @@
                   optional: false
             - name: COMMON_BLOCKED_DATASETS
               value: Graphcore/gqa,Graphcore/gqa-lxmert,Graphcore/vqa,Graphcore/vqa-lxmert,echarlaix/gqa-lxmert,echarlaix/vqa,echarlaix/vqa-lxmert,KakologArchives/KakologArchives,open-llm-leaderboard/*
+            - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST
+              value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}}'
             - name: COMMON_HF_ENDPOINT
               value: https://huggingface.co
             - name: HF_ENDPOINT

Legend Status
The app is synced in ArgoCD, and diffs you see are solely from this PR.
⚠️ The app is out-of-sync in ArgoCD, and the diffs you see include those changes plus any from this PR.
🛑 There was an error generating the ArgoCD diffs due to changes in this PR.

@lhoestq lhoestq merged commit 90d6495 into main Oct 19, 2023
21 of 22 checks passed
@lhoestq lhoestq deleted the disable-dataset-scripts branch October 19, 2023 16:33
@lhoestq lhoestq mentioned this pull request Jan 3, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants