Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release dataset on HF (with gated access) #24

Open
NielsRogge opened this issue Sep 24, 2024 · 3 comments
Open

Release dataset on HF (with gated access) #24

NielsRogge opened this issue Sep 24, 2024 · 3 comments

Comments

@NielsRogge
Copy link

Hi @yixchen,

Niels here from the open-source team at Hugging Face. I discovered your work through ECCV, it was featured on daily papers here: https://huggingface.co/papers/2401.09340 (feel free to claim it with your HF account). I work together with AK on improving the visibility of researchers' work on the hub.

It'd be great to make the dataset available on the 🤗 hub, we can add tags so that people find them when filtering https://huggingface.co/datasets, so that people can do:

from datasets import load_dataset

dataset = load_dataset("your-hf-username-or-organization/sceneverse")

See here for a guide: https://huggingface.co/docs/datasets/loading.

There's then also the dataset viewer which allows people to see the first few rows in the browser: https://huggingface.co/docs/hub/en/datasets-viewer.

This would make the dataset easier accessible, and also discoverable. We can then also link the dataset to the paper page.

Gating mechanism

We support gating (similar to how this works for llama/mistral models), so that you can manually review who can access the dataset: https://huggingface.co/docs/hub/en/datasets-gated. See FineVideo (a video dataset created by HF) as an example: https://huggingface.co/datasets/HuggingFaceFV/finevideo.

Let us know if you need any help.

Cheers,

Niels
ML Engineer @ HF 🤗

@Buzz-Beater
Copy link
Contributor

Sure! Thanks for reaching out, we will look into it :-)

@Buzz-Beater
Copy link
Contributor

Hi, just curious, if we want to still hold all the data access license on HF Datasets (which is now provided in the google form), how should we properly do it when uploading to HF?

@NielsRogge
Copy link
Author

Hi, this is explained here: https://huggingface.co/docs/hub/en/datasets-gated.

Usually people start by uploading the data as a private dataset repository, then make it public when gated access is enabled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants