Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are Wiktionary's appendixes included in wiktextract, or can they be extracted? #91

Open
yolpsoftware opened this issue Nov 9, 2021 · 1 comment
Labels
enhancement New feature or request

Comments

@yolpsoftware
Copy link

I would like to use the data in some of the appendixes, like

https://en.wiktionary.org/wiki/Appendix:Animals
https://en.wiktionary.org/wiki/Appendix:Human_bones

Do I need to use them directly from Wiktionary, or is there a wiktextract way?

@tatuylonen tatuylonen added the enhancement New feature or request label Nov 18, 2021
@tatuylonen
Copy link
Owner

The appendixes are not currently extracted. I agree some appendixes contain useful information that would be nice to extract.

It would be very easy to just extract the Appendix namespace into a .tar file, similar to how Module and Template namespaces are now extracted (available on https://kaikki.org/dictionary/rawdata.html). They could also be parsed in a more structured form, but there is no consistent format for all appendices; this would require reviewing each and deciding whether to extract it and how (several different formats would need to be supported, but probably several appendices would share the same format).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants