Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyphenation data missing in non-English editions #853

Open
platinorum opened this issue Oct 5, 2024 · 4 comments
Open

Hyphenation data missing in non-English editions #853

platinorum opened this issue Oct 5, 2024 · 4 comments

Comments

@platinorum
Copy link

There is no data for hyphenation in the output file.
This is possibly related to an old issue (#159).

Example:
"apple" on wiktionary vs. "apple" on kaikki

@kristian-clausal
Copy link
Collaborator

It's there. Open raw data and search for "hyphenation": it's present for the noun and the verb.

@platinorum
Copy link
Author

It's there. Open raw data and search for "hyphenation": it's present for the noun and the verb.

You are right, it is in the English edition, sorry for not testing properly, I guess. I checked for the French, German, Spanish and Polish editions, and it seems to work in none of them.

@platinorum platinorum changed the title Hyphenation data missing again Hyphenation data missing in non-English version Oct 10, 2024
@platinorum platinorum changed the title Hyphenation data missing in non-English version Hyphenation data missing in non-English editions Oct 10, 2024
@xxyzz
Copy link
Collaborator

xxyzz commented Oct 11, 2024

es edition has "syllabic" field in "sounds" lists, de edition's "Worttrennung" section currently is not extracted, fr and pl editions don't seem to have this kind of data.

Hyphenation data are added to es and de editions: #863, #864

@kristian-clausal
Copy link
Collaborator

Yeah, because the editions are each so different, data like hyphenation needs to be specially programmed into their respective extractors. In some languages, having separate fields for hyphenation makes no sense because they use predictable rules or syllables. Please keep in mind that English hyphenation data is only applicable to writing, specifically how you are supposed to divide words on line boundaries, it's not actual phonetic or 'real' language data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants