Replies: 1 comment
-
Hi! You can use Git tags to mark the releases (versions) of a dataset: import huggingface_hub
# version 1.0
dataset.push_to_hub(
"user_name/dataset_name",
)
huggingface_hub.create_tag("user_name/dataset_name", tag="1.0", repo_type="dataset")
# version 2.0
dataset.push_to_hub(
"user_name/dataset_name",
)
huggingface_hub.create_tag("user_name/dataset_name", tag="2.0", repo_type="dataset") And then reference them when loading as follows: # version 1.0
load_dataset("user_name/dataset_name", revision="1.0")
# version 2.0
load_dataset("user_name/dataset_name", revision="2.0") PS: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Currently, when uploading a dataset, one would call:
So I thought one could create a new version with:
But then I get the following error:
How to then create a new version?
I've tried cloning, tagging the last commit, pushing it, then calling
push_to_hub
again.However, I seed now duplicates of shard, see below:
How do I then get only one version in the commit of
v1
?Beta Was this translation helpful? Give feedback.
All reactions