Rename to dataset viewer part1 (#2663)

* Datasets Server -> (the) dataset viewer (API) * more renaming + change repo name * datasets-server -> dataset-viewer where it has no side-effect
huggingface · Apr 5, 2024 · cc97f89 · cc97f89
1 parent cbf56be
commit cc97f89
Show file tree

Hide file tree

Showing 66 changed files with 322 additions and 390 deletions.
diff --git a/.github/workflows/_e2e_tests.yml b/.github/workflows/_e2e_tests.yml
@@ -67,7 +67,7 @@ jobs:
           CLOUDFRONT_KEY_PAIR_ID: "K3814DK2QUJ71H"
           CLOUDFRONT_PRIVATE_KEY: ${{ secrets.CLOUDFRONT_PRIVATE_KEY }}
           HF_HUB_ENABLE_HF_TRANSFER: "1"
-        run: docker compose -f docker-compose-datasets-server.yml up -d --wait --wait-timeout 20
+        run: docker compose -f docker-compose-dataset-viewer.yml up -d --wait --wait-timeout 20
         working-directory: ./tools
       - name: Install poetry
         run: pipx install poetry==${{ env.poetry-version }}

diff --git a/.github/workflows/e2e.yml b/.github/workflows/e2e.yml
@@ -16,7 +16,7 @@ on:
       - ".github/workflows/_quality-python.yml"
       - ".github/workflows/e2e.yml"
       - "tools/Python.mk"
-      - "tools/docker-compose-datasets-server.yml"
+      - "tools/docker-compose-dataset-viewer.yml"
   pull_request:
     paths:
       - "e2e/**"
@@ -27,7 +27,7 @@ on:
       - ".github/workflows/_quality-python.yml"
       - ".github/workflows/e2e.yml"
       - "tools/Python.mk"
-      - "tools/docker-compose-datasets-server.yml"
+      - "tools/docker-compose-dataset-viewer.yml"
 jobs:
   quality:
     uses: ./.github/workflows/_quality-python.yml

diff --git a/.github/workflows/stale.yml b/.github/workflows/stale.yml
@@ -10,7 +10,7 @@ on:
 jobs:
   close_stale_issues:
     name: Close Stale Issues
-    if: github.repository == 'huggingface/datasets-server'
+    if: github.repository == 'huggingface/dataset-viewer'
     runs-on: ubuntu-latest
     env:
       GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

diff --git a/AUTHORS b/AUTHORS
@@ -1,4 +1,4 @@
-# This is the list of HuggingFace Datasets Server authors for copyright purposes.
+# This is the list of HuggingFace dataset viewer authors for copyright purposes.
 #
 # This does not necessarily list everyone who has contributed code, since in
 # some cases, their employer may be the copyright holder.  To see the full list

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,8 +1,8 @@
-# How to contribute to the Datasets Server?
+# How to contribute to the dataset viewer?
 
 [![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg)](CODE_OF_CONDUCT.md)
 
-The Datasets Server is an open source project, so all contributions and suggestions are welcome.
+The dataset viewer is an open source project, so all contributions and suggestions are welcome.
 
 You can contribute in many different ways: giving ideas, answering questions, reporting bugs, proposing enhancements,
 improving the documentation, fixing bugs...
@@ -28,14 +28,14 @@ If you would like to work on any of the open Issues:
 
 ## How to create a Pull Request?
 
-1. Fork the [repository](https://github.com/huggingface/datasets-server) by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account.
+1. Fork the [repository](https://github.com/huggingface/dataset-viewer) by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account.
 
 2. Clone your fork to your local disk, and add the base repository as a remote:
 
    ```bash
-   git clone [email protected]:<your Github handle>/datasets-server.git
-   cd datasets-server
-   git remote add upstream https://github.com/huggingface/datasets-server.git
+   git clone [email protected]:<your Github handle>/dataset-viewer.git
+   cd dataset-viewer
+   git remote add upstream https://github.com/huggingface/dataset-viewer.git
    ```
 
 3. Create a new branch to hold your development changes:

diff --git a/DEVELOPER_GUIDE.md b/DEVELOPER_GUIDE.md
@@ -7,8 +7,8 @@ This document is intended for developers who want to install, test or contribute
 To start working on the project:
 
 ```bash
-git clone [email protected]:huggingface/datasets-server.git
-cd datasets-server
+git clone [email protected]:huggingface/dataset-viewer.git
+cd dataset-viewer
 ```
 
 Install docker (see https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository and https://docs.docker.com/engine/install/linux-postinstall/)

diff --git a/Makefile b/Makefile
@@ -15,10 +15,10 @@ dev-start: export COMPOSE_PROJECT_NAME := dev-datasets-server
 dev-stop: export COMPOSE_PROJECT_NAME := dev-datasets-server
 
 # makefile variables per target
-start: DOCKER_COMPOSE := ./tools/docker-compose-datasets-server.yml
-stop: DOCKER_COMPOSE := ./tools/docker-compose-datasets-server.yml
-dev-start: DOCKER_COMPOSE := ./tools/docker-compose-dev-datasets-server.yml
-dev-stop: DOCKER_COMPOSE := ./tools/docker-compose-dev-datasets-server.yml
+start: DOCKER_COMPOSE := ./tools/docker-compose-dataset-viewer.yml
+stop: DOCKER_COMPOSE := ./tools/docker-compose-dataset-viewer.yml
+dev-start: DOCKER_COMPOSE := ./tools/docker-compose-dev-dataset-viewer.yml
+dev-stop: DOCKER_COMPOSE := ./tools/docker-compose-dev-dataset-viewer.yml
 
 include tools/Docker.mk
 

diff --git a/README.md b/README.md
@@ -1,16 +1,16 @@
-# Datasets server
+# Dataset viewer
 
 > Integrate into your apps over 10,000 datasets via simple HTTP requests, with pre-processed responses and scalability built-in.
 
 Documentation: https://huggingface.co/docs/datasets-server
 
 ## Ask for a new feature 🎁
 
-The datasets server pre-processes the [Hugging Face Hub datasets](https://huggingface.co/datasets) to make them ready to use in your apps using the API: list of the splits, first rows.
+The dataset viewer pre-processes the [Hugging Face Hub datasets](https://huggingface.co/datasets) to make them ready to use in your apps using the API: list of the splits, first rows.
 
-We plan to [add more features](https://github.com/huggingface/datasets-server/issues?q=is%3Aissue+is%3Aopen+label%3A%22feature+request%22) to the server. Please comment there and upvote your favorite requests.
+We plan to [add more features](https://github.com/huggingface/dataset-viewer/issues?q=is%3Aissue+is%3Aopen+label%3A%22feature+request%22) to the server. Please comment there and upvote your favorite requests.
 
-If you think about a new feature, please [open a new issue](https://github.com/huggingface/datasets-server/issues/new).
+If you think about a new feature, please [open a new issue](https://github.com/huggingface/dataset-viewer/issues/new).
 
 ## Contribute 🤝
 
@@ -20,8 +20,8 @@ To install the server and start contributing to the code, see [DEVELOPER_GUIDE.m
 
 ## Community 🤗
 
-You can star and watch this [GitHub repository](https://github.com/huggingface/datasets-server) to follow the updates.
+You can star and watch this [GitHub repository](https://github.com/huggingface/dataset-viewer) to follow the updates.
 
 You can ask for help or answer questions on the [Forum](https://discuss.huggingface.co/c/datasets/10) and [Discord](https://discord.com/channels/879548962464493619/1019883044724822016).
 
-You can also report bugs and propose enhancements on the code, or the documentation, in the [GitHub issues](https://github.com/huggingface/datasets-server/issues).
+You can also report bugs and propose enhancements on the code, or the documentation, in the [GitHub issues](https://github.com/huggingface/dataset-viewer/issues).
diff --git a/chart/Chart.yaml b/chart/Chart.yaml
@@ -3,7 +3,7 @@
 
 apiVersion: v2
 name: datasets-server
-description: A Helm chart for the datasets-server application
+description: A Helm chart for the dataset-viewer application
 
 # A chart can be either an 'application' or a 'library' chart.
 #
@@ -25,7 +25,7 @@ version: 2.0.0
 # follow Semantic Versioning. They should reflect the version the application is using.
 # It is recommended to use it with quotes.
 #
-# See https://github.com/huggingface/datasets-server/releases
+# See https://github.com/huggingface/dataset-viewer/releases
 appVersion: "0.22.2"
 
 icon: https://huggingface.co/front/assets/huggingface_logo-noborder.svg

diff --git a/chart/README.md b/chart/README.md
@@ -1,15 +1,15 @@
-# datasets-server Helm chart
+# Dataset viewer Helm chart
 
-The `datasets-server` Helm [chart](https://helm.sh/docs/topics/charts/) describes the Kubernetes resources of the datasets-server application.
+The dataset viewer Helm [chart](https://helm.sh/docs/topics/charts/) describes the Kubernetes resources of the dataset viewer application.
 
 If you have access to the internal HF notion, see https://www.notion.so/huggingface2/Infrastructure-b4fd07f015e04a84a41ec6472c8a0ff5.
 
-The cloud infrastructure for the datasets-server uses:
+The cloud infrastructure for the dataset viewer uses:
 
-- Docker Hub to store the docker images of the datasets-server services.
+- Docker Hub to store the docker images of the dataset viewer services.
 - Amazon EKS for the Kubernetes clusters.
 
-Note that this Helm chart is used to manage the deployment of the `datasets-server` services to the cloud infrastructure (AWS) using Kubernetes. The infrastructure in itself is not created here, but in https://github.com/huggingface/infra/ using terraform. If you need to create or modify some resources, contact the infra team.
+Note that this Helm chart is used to manage the deployment of the dataset viewer services to the cloud infrastructure (AWS) using Kubernetes. The infrastructure in itself is not created here, but in https://github.com/huggingface/infra/ using terraform. If you need to create or modify some resources, contact the infra team.
 
 ## Deploy
 

diff --git a/chart/nginx-templates/default.conf.template b/chart/nginx-templates/default.conf.template
@@ -28,7 +28,7 @@ server {
   set $cached_assets_storage_root ${CACHED_ASSETS_STORAGE_ROOT};
 
   location /openapi.json {
-    return 307 https://raw.githubusercontent.com/huggingface/datasets-server/main/${OPENAPI_FILE};
+    return 307 https://raw.githubusercontent.com/huggingface/dataset-viewer/main/${OPENAPI_FILE};
   }
 
   location /assets/ {

diff --git a/chart/templates/_common/_helpers.tpl b/chart/templates/_common/_helpers.tpl
@@ -162,7 +162,7 @@ Return the api ingress anotation
 {{- end -}}
 
 {{/*
-Datasets Server base url
+The dataset viewer API base url
 */}}
 {{- define "datasetsServer.ingress.hostname" -}}
 {{ .Values.global.huggingface.ingress.subdomains.datasetsServer }}.{{ .Values.global.huggingface.ingress.domain }}
@@ -195,7 +195,7 @@ The cached-assets base URL
 
 {{/*
 The parquet-metadata/ subpath in the EFS
-- in a subdirectory named as the chart (datasets-server/), and below it,
+- in a subdirectory named as the chart (dataset-viewer/), and below it,
 - in a subdirectory named as the Release, so that Releases will not share the same dir
 */}}
 {{- define "parquetMetadata.subpath" -}}
@@ -204,7 +204,7 @@ The parquet-metadata/ subpath in the EFS
 
 {{/*
 The duckdb-index/ subpath in EFS
-- in a subdirectory named as the chart (datasets-server/), and below it,
+- in a subdirectory named as the chart (dataset-viewer/), and below it,
 - in a subdirectory named as the Release, so that Releases will not share the same dir
 */}}
 {{- define "duckDBIndex.subpath" -}}

diff --git a/chart/templates/_env/_envDiscussions.tpl b/chart/templates/_env/_envDiscussions.tpl
@@ -14,7 +14,7 @@
   {{- else }}
   value: {{ .Values.secrets.appParquetConverterHfToken.value }}
   {{- end }}
-  # ^ we use the same token (datasets-server-bot) for discussions and for uploading parquet files
+  # ^ we use the same token (dataset viewer bot) for discussions and for uploading parquet files
 - name: DISCUSSIONS_PARQUET_REVISION
   value: {{ .Values.parquetAndInfo.targetRevision | quote }}
 {{- end -}}
diff --git a/chart/values.yaml b/chart/values.yaml
@@ -289,7 +289,7 @@ hfDatasetsCache:
   cacheDirectory: "/tmp/hf-datasets-cache"
 
 discussions:
-  # name of the Hub user associated with the Datasets Server bot app
+  # name of the Hub user associated with the dataset viewer bot app
   botAssociatedUserName: "parquet-converter"
 
 # --- jobs (pre-install/upgrade hooks) ---

diff --git a/docs/README.md b/docs/README.md
@@ -48,7 +48,7 @@ The documentation is available at http://localhost:3000/.
 To build the documentation, launch:
 
 ```bash
-BUILD_DIR=/tmp/doc-datasets-server/ make build
+BUILD_DIR=/tmp/doc-dataset-viewer/ make build
 ```
 
 You can adapt the `BUILD_DIR` environment variable to set any temporary folder that you prefer. This command will create it and generate
@@ -69,7 +69,7 @@ will see a bot add a comment to a link where the documentation with your changes
 Accepted files are Markdown (.md or .mdx).
 
 Create a file with its extension and put it in the source directory. You can then link it to the toc-tree by putting
-the filename without the extension in the [`_toctree.yml`](https://github.com/huggingface/datasets-server/blob/main/docs/source/_toctree.yml) file.
+the filename without the extension in the [`_toctree.yml`](https://github.com/huggingface/dataset-viewer/blob/main/docs/source/_toctree.yml) file.
 
 ## Adding an image
 

diff --git a/docs/pyproject.toml b/docs/pyproject.toml
@@ -1,7 +1,7 @@
 [tool.poetry]
 authors = ["Sylvain Lesage <[email protected]>"]
-description = "Documentation for datasets-server"
-name = "datasets-server-doc"
+description = "Documentation for dataset-viewer"
+name = "dataset-viewer-doc"
 version = "0.1.0"
 
 [tool.poetry.dependencies]

diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -1,7 +1,7 @@
 - title: Get Started
   sections:
     - local: index
-      title: 🤗 Datasets server
+      title: 🤗 Dataset viewer
     - local: quick_start
       title: Quickstart
     - local: analyze_data
@@ -30,7 +30,7 @@
       title: Explore dataset statistics
     - local: croissant
       title: Get Croissant metadata
-    - title: Query datasets from Datasets Server
+    - title: Query datasets from dataset viewer API
       sections:
         - local: parquet_process
           title: Overview

diff --git a/docs/source/croissant.md b/docs/source/croissant.md
@@ -1,6 +1,6 @@
 # Get Croissant metadata
 
-Datasets Server automatically generates the metadata in [Croissant](https://github.com/mlcommons/croissant) format (JSON-LD) for every dataset on the Hugging Face Hub. It lists the dataset's name, description, URL, and the distribution of the dataset as Parquet files, including the columns' metadata. The Croissant metadata is available for all the datasets that can be [converted to Parquet format](./parquet#conversion-to-parquet).
+The dataset viewer automatically generates the metadata in [Croissant](https://github.com/mlcommons/croissant) format (JSON-LD) for every dataset on the Hugging Face Hub. It lists the dataset's name, description, URL, and the distribution of the dataset as Parquet files, including the columns' metadata. The Croissant metadata is available for all the datasets that can be [converted to Parquet format](./parquet#conversion-to-parquet).
 
 ## What is Croissant?
 

diff --git a/docs/source/data_types.md b/docs/source/data_types.md
@@ -1,6 +1,6 @@
 # Data types
 
-Datasets supported by Datasets Server have a tabular format, meaning a data point is represented in a row and its features are contained in columns. Using the `/first-rows` endpoint allows you to preview the first 100 rows of a dataset and information about each feature. Within the `features` key, you'll notice it returns a `_type` field. This value describes the data type of the column, and it is also known as a dataset's [`Features`](https://huggingface.co/docs/datasets/about_dataset_features). 
+Datasets supported by the dataset viewer have a tabular format, meaning a data point is represented in a row and its features are contained in columns. Using the `/first-rows` endpoint allows you to preview the first 100 rows of a dataset and information about each feature. Within the `features` key, you'll notice it returns a `_type` field. This value describes the data type of the column, and it is also known as a dataset's [`Features`](https://huggingface.co/docs/datasets/about_dataset_features). 
 
 There are several different data `Features` for representing different data formats such as [`Audio`](https://huggingface.co/docs/datasets/v2.5.2/en/package_reference/main_classes#datasets.Audio) and [`Image`](https://huggingface.co/docs/datasets/v2.5.2/en/package_reference/main_classes#datasets.Image) for speech and image data respectively. Knowing a dataset feature gives you a better understanding of the data type you're working with, and how you can preprocess it.
 

diff --git a/docs/source/filter.md b/docs/source/filter.md
@@ -1,14 +1,14 @@
 # Filter rows in a dataset
 
-Datasets Server provides a `/filter` endpoint for filtering rows in a dataset.
+The dataset viewer provides a `/filter` endpoint for filtering rows in a dataset.
 
 <Tip warning={true}>
   Currently, only <a href="./parquet">datasets with Parquet exports</a>
-  are supported so Datasets Server can index the contents and run the filter query without
+  are supported so the dataset viewer can index the contents and run the filter query without
   downloading the whole dataset.
 </Tip>
 
-This guide shows you how to use Datasets Server's `/filter` endpoint to filter rows based on a query string.
+This guide shows you how to use the dataset viewer's `/filter` endpoint to filter rows based on a query string.
 Feel free to also try it out with [ReDoc](https://redocly.github.io/redoc/?url=https://datasets-server.huggingface.co/openapi.json#operation/filterRows).
 
 The `/filter` endpoint accepts the following query parameters:

diff --git a/docs/source/first_rows.md b/docs/source/first_rows.md
@@ -1,10 +1,10 @@
 # Preview a dataset
 
-Datasets Server provides a `/first-rows` endpoint for visualizing the first 100 rows of a dataset. This'll give you a good idea of the data types and example data contained in a dataset.
+The dataset viewer provides a `/first-rows` endpoint for visualizing the first 100 rows of a dataset. This'll give you a good idea of the data types and example data contained in a dataset.
 
 ![dataset-viewer](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/dataset-viewer.png)
 
-This guide shows you how to use Datasets Server's `/first-rows` endpoint to preview a dataset. Feel free to also try it out with [Postman](https://www.postman.com/huggingface/workspace/hugging-face-apis/request/23242779-32d6a8be-b800-446a-8cee-f6b5ca1710df), [RapidAPI](https://rapidapi.com/hugging-face-hugging-face-default/api/hugging-face-datasets-api), or [ReDoc](https://redocly.github.io/redoc/?url=https://datasets-server.huggingface.co/openapi.json#operation/listFirstRows).
+This guide shows you how to use the dataset viewer's `/first-rows` endpoint to preview a dataset. Feel free to also try it out with [Postman](https://www.postman.com/huggingface/workspace/hugging-face-apis/request/23242779-32d6a8be-b800-446a-8cee-f6b5ca1710df), [RapidAPI](https://rapidapi.com/hugging-face-hugging-face-default/api/hugging-face-datasets-api), or [ReDoc](https://redocly.github.io/redoc/?url=https://datasets-server.huggingface.co/openapi.json#operation/listFirstRows).
 
 The `/first-rows` endpoint accepts three query parameters: