Skip to content

yomidevs/kaikki-to-yomitan

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

d1a53ea · Jun 3, 2024
May 31, 2024
May 16, 2024
Apr 14, 2024
May 12, 2024
May 23, 2024
May 10, 2024
Jan 20, 2024
Jan 12, 2024
May 16, 2024
May 10, 2024
May 24, 2024
May 10, 2024
May 28, 2024
May 28, 2024
Jun 3, 2024
Apr 8, 2024
May 10, 2024
Jan 23, 2024
May 12, 2024

Repository files navigation

Converts wiktionary data from https://kaikki.org/ to yomitan-compatible dictionaries. Converted dictionaries can be found in the Releases section.

Instructions

(examples use German (de) to English (en))

Basic Run

  1. Create a .env file based on .env.example.

  2. If your language is not in languages.json, add it.

  3. Run ./auto.sh German English.

  4. Dictionaries should be in data/language/de/en.

Contributing

Instead of a language name, you can also write ? to run for all languages.

  • ./auto.sh ? English will run for any language to English.
  • ./auto.sh German ? will run for German to any language.

The auto.sh script can also be run with flags:

  • k: keep files (by default, the script deletes the downloaded files after running),
  • d: redownload (by default, the script skips downloading if the file already exists),
  • t: force_tidy (run tidy script again, even if its output already exists. useful when the tidy script is updated),
  • y: force_ymt (run yomitan script again, even if its output already exists. useful when the yomitan script is updated),
  • F: force = force_tidy + force_ymt,

Most often, you will want to run ./auto.sh German English kty to recreate the dictionaries, then load them in yomitan and test them.

After a run, data/language/de/en should contain files with skipped tags for IPA and terms. Adding some to tag_bank_ipa.json or tag_bank_term.json is an easy way to improve the conversion for your language pair.

Tests

Test inputs are in data/test/kaikki. Each line is a line from the corresponding kaikki file (from data/kaikki, after downloading).

To fix something in the conversion of a word, add its line from data/kaikki to the corresponding test file in data/test/kaikki. Then run npm run test-write to add it to the expected test output, and commit the changes (e.g. add baseline test for "word"). Now when you modify tidy-up or make-yomitan, you can run npm run test-write to see the changes you made.

If you are making a change that shouldn't change the output, just run npm run test to check if anything broke.