Source of language datasets #14

DonaldTsang · 2019-11-20T01:40:49Z

Where is the source text dataset for the Ngrams of those 100 languages? Would like to see if it is different from wooorm/franc#78 usage of UDHR, and if it is more accurate than them.

Animenosekai · 2020-03-30T09:21:34Z

Where is the source text dataset for the Ngrams of those 100 languages? Would like to see if it is different from wooorm/franc#78 usage of UDHR, and if it is more accurate than them.

@DonaldTsang I really don’t know because I’m not the dev but isn’t it in _languageData.js?

_{Sent with GitHawk}

Animenosekai · 2020-03-30T09:22:47Z

Where is the source text dataset for the Ngrams of those 100 languages? Would like to see if it is different from wooorm/franc#78 usage of UDHR, and if it is more accurate than them.

@DonaldTsang (inside the lib folder)

_{Sent with GitHawk}

Animenosekai · 2020-03-30T09:24:19Z

Where is the source text dataset for the Ngrams of those 100 languages? Would like to see if it is different from wooorm/franc#78 usage of UDHR, and if it is more accurate than them.

@DonaldTsang But it’s weird because there isn’t all language and the ones which are in it are not written in the actual language (for example: in “fr” it isn’t written in French and I don’t understand what’s written)

_{Sent with GitHawk}

Animenosekai · 2020-03-30T09:25:00Z

Where is the source text dataset for the Ngrams of those 100 languages? Would like to see if it is different from wooorm/franc#78 usage of UDHR, and if it is more accurate than them.

@DonaldTsang The dev used primarily Unicode checking to determine the language tho

_{Sent with GitHawk}

DonaldTsang · 2020-03-30T16:22:58Z

@Animenosekai if it does only use Unicode checking, that would actually be really sweet as that is very useful for my cause of making language checking easier (which I hope can re implement in Python).

DonaldTsang · 2020-03-30T16:23:56Z

The _languageData.js seems like N-Gram data.

Animenosekai · 2020-03-30T16:30:54Z

@Animenosekai if it does only use Unicode checking, that would actually be really sweet as that is very useful for my cause of making language checking easier (which I hope can re implement in Python).

I don't think that it uses only Unicode checking but why don't you open guessLanguage.js as it should contain everything you wanna know

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source of language datasets #14

Source of language datasets #14

DonaldTsang commented Nov 20, 2019

Animenosekai commented Mar 30, 2020

Animenosekai commented Mar 30, 2020

Animenosekai commented Mar 30, 2020

Animenosekai commented Mar 30, 2020

DonaldTsang commented Mar 30, 2020

DonaldTsang commented Mar 30, 2020

Animenosekai commented Mar 30, 2020

Source of language datasets #14

Source of language datasets #14

Comments

DonaldTsang commented Nov 20, 2019

Animenosekai commented Mar 30, 2020

Animenosekai commented Mar 30, 2020

Animenosekai commented Mar 30, 2020

Animenosekai commented Mar 30, 2020

DonaldTsang commented Mar 30, 2020

DonaldTsang commented Mar 30, 2020

Animenosekai commented Mar 30, 2020