This repo is comprised of a simple script that generates a character to pinyin and jyutping mapping
in JSON from the Unihan database, as well as a Chrome extension, located under chrome_extension
,
which uses the results of the script to replace or supplement Chinese characters with hanzi. The
Chrome extension may be installed by downloading the hanzi_to_pinyin.crx
file.
The generation script is generate_mappings.py
and runs on Python 3.7 or later. It generates both
pinyin and jyutping output files. The output format is JSON, and the respective outputs are placed
into hanzi_to_pinyin.json
and hanzi_to_jyutping.json
.
Data is taken from the Unihan database. It is available for download at http://www.unicode.org/Public/UCD/latest/ucd/Unihan.zip
Place the Unihan_Readings.txt
from the Unihan database file in the base directory of your local
copy of this repo.
It's as simple as loading your JSON file into memory and doing substitutions for each word.
An example that reads input from stdin and outputs into stdout is located at hanzi_to_pinyin.py
.