Error on load Swahili datasets from common voice using Hugging face dataset #4171

Tonyloyt · 2022-04-14T11:31:48Z

Tonyloyt
Apr 14, 2022

Hi guys,
I'm facing a problem loading the Swahili dataset from common voice. I'm using google colab Swahili datasets are already uploaded in the Hugging face dataset package but I can't load them.

My approach:
!pip install datasets=2.0.0
from datasets import load_dataset
common_voice_train = load_dataset("common_voice", "sw", split="train")

The resulting error:

ValueError: BuilderConfig sw not found. Available: ['ab', 'ar', 'as', 'br', 'ca', 'cnh', 'cs', 'cv', 'cy', 'de', 'dv', 'el', 'en', 'eo', 'es', 'et', 'eu', 'fa', 'fi', 'fr', 'fy-NL', 'ga-IE', 'hi', 'hsb', 'hu', 'ia', 'id', 'it', 'ja', 'ka', 'kab', 'ky', 'lg', 'lt', 'lv', 'mn', 'mt', 'nl', 'or', 'pa-IN', 'pl', 'pt', 'rm-sursilv', 'rm-vallader', 'ro', 'ru', 'rw', 'sah', 'sl', 'sv-SE', 'ta', 'th', 'tr', 'tt', 'uk', 'vi', 'vot', 'zh-CN', 'zh-HK', 'zh-TW']

Any assistance on solving this, please, or even the idea of loading Swahili common voice data apart from manual downloading

Thank you

mariosasko · 2022-04-14T15:26:18Z

mariosasko
Apr 14, 2022
Collaborator

Hi! This version of the common_voice script is deprecated. To see the message update your installation of datasets to the newest version with:

!pip install -U datasets

Instead, use the common_voice scripts under the mozilla-foundation namespace. Using these, you can download the Swahili subset as follows:

from datasets import load_dataset
common_voice_train = load_dataset("mozilla-foundation/common_voice_8_0", "sw", split="train")

1 reply

Tonyloyt Apr 15, 2022
Author

Thank you, It worked.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on load Swahili datasets from common voice using Hugging face dataset #4171

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Error on load Swahili datasets from common voice using Hugging face dataset #4171

Tonyloyt Apr 14, 2022

Replies: 1 comment · 1 reply

mariosasko Apr 14, 2022 Collaborator

Tonyloyt Apr 15, 2022 Author

Tonyloyt
Apr 14, 2022

Replies: 1 comment 1 reply

mariosasko
Apr 14, 2022
Collaborator

Tonyloyt Apr 15, 2022
Author