This project was started to specifically tag Smith & Van Dyck Arabic Bible translation with Strong’s numbers. However, the process used can be applied to virtually any Bible. It was recently applied to a Zokam (Burmese) Bible.
- The Bible text is provided in a Verse Per Line (VPL) format.
- The text is then normalized and stemmed. This basically means removing any diacritics and punctuation marks, then reducing words to a basic form (it is not exactly the roots of the words).
- Using a publicly available software, Berkely Aligner, to map the text to Strong’s number. The process is not perfect, but has around 75% accuracy (which can vary between languages).
- As a last step, the mapped text is converted to OSIS format from which a SWORD module can be created. SWORD modules are used by many Bible study tools.
Refer to From-VPL-Bible-to-tagged-SWORD-module in this repository for details.