Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling dataset redirects #2688

Closed
PeterAJansen opened this issue Apr 9, 2024 · 4 comments
Closed

Handling dataset redirects #2688

PeterAJansen opened this issue Apr 9, 2024 · 4 comments
Labels
feature request Request for a new feature P1 Not as needed as P0, but still important/wanted

Comments

@PeterAJansen
Copy link

Currently datasets-server doesn't appear to handle dataset redirects, and gives an error.

For example:

  • Querying datasets endpoints with the dataset squad works

  • Querying datasets-server endpoints with the dataset squad gives an "unknown error"

  • Querying datasets-server endpoints with a random dataset name (e.g. xyz2309348) gives an expected response (e.g. dataset unknown/doesn't exist)

  • Querying datasets-server endpoints with the redirected name (rajpurkar/squad) works.

@severo severo added feature request Request for a new feature P1 Not as needed as P0, but still important/wanted labels Apr 9, 2024
@severo
Copy link
Collaborator

severo commented Apr 9, 2024

The dataset viewer only knows the last repository name if it has been renamed once or more. For example, squad was renamed to rajpurkar/squad. The previous entries for squad in the database were deleted, and new ones were created for rajpurkar/squad. As we don't check if the repo was renamed, asking for dataset=squad returns an error.

To support this, if the dataset has not been found in the database, we should request the Hub to get the current name of the repo (in the example: rajpurkar/squad) and look again if the dataset exists in the database.

We can get this info by looking at the id field in hfh.dataset_info() (see https://huggingface.co/api/datasets/squad)

@severo
Copy link
Collaborator

severo commented Aug 21, 2024

also:

Querying datasets-server endpoints with the dataset squad gives an "unknown error"

^ this is a bug

@severo
Copy link
Collaborator

severo commented Aug 22, 2024

With #3035, it now returns 404 with {"error":"The dataset has been renamed. Please use the current dataset name."}. At least, this bug is fixed.

https://datasets-server.huggingface.co/splits?dataset=mnist or https://datasets-server.huggingface.co/splits?dataset=squad

@severo
Copy link
Collaborator

severo commented Aug 22, 2024

Let's close. We don't plan to support repositories redirection in this API.

@severo severo closed this as completed Aug 22, 2024
@severo severo closed this as not planned Won't fix, can't repro, duplicate, stale Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature P1 Not as needed as P0, but still important/wanted
Projects
None yet
Development

No branches or pull requests

2 participants