Are Wiktionary's appendixes included in wiktextract, or can they be extracted? #91

yolpsoftware · 2021-11-09T13:24:40Z

I would like to use the data in some of the appendixes, like

https://en.wiktionary.org/wiki/Appendix:Animals
https://en.wiktionary.org/wiki/Appendix:Human_bones

Do I need to use them directly from Wiktionary, or is there a wiktextract way?

tatuylonen · 2021-11-18T20:31:20Z

The appendixes are not currently extracted. I agree some appendixes contain useful information that would be nice to extract.

It would be very easy to just extract the Appendix namespace into a .tar file, similar to how Module and Template namespaces are now extracted (available on https://kaikki.org/dictionary/rawdata.html). They could also be parsed in a more structured form, but there is no consistent format for all appendices; this would require reviewing each and deciding whether to extract it and how (several different formats would need to be supported, but probably several appendices would share the same format).

tatuylonen added the enhancement New feature or request label Nov 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are Wiktionary's appendixes included in wiktextract, or can they be extracted? #91

Are Wiktionary's appendixes included in wiktextract, or can they be extracted? #91

yolpsoftware commented Nov 9, 2021

tatuylonen commented Nov 18, 2021

Are Wiktionary's appendixes included in wiktextract, or can they be extracted? #91

Are Wiktionary's appendixes included in wiktextract, or can they be extracted? #91

Comments

yolpsoftware commented Nov 9, 2021

tatuylonen commented Nov 18, 2021