Skip to content

Latest commit

 

History

History
60 lines (43 loc) · 3.43 KB

README.md

File metadata and controls

60 lines (43 loc) · 3.43 KB

CedictXML

Description

CedictXML is a simple tool written in Python to convert an original CC-CEDICT file to a XML dictionary file in the logical XDXF format, which can be used with dictionary software that support this format.

Screenshot

Screenshot of XDXF CC-CEDICT open on GoldenDict 1.5

Screenshot of XDXF CC-CEDICT open on GoldenDict 1.5

Dependencies

Usage

(Assuming Python 3 and all dependencies are installed and cedictxml.py and pinyin.py are in the same folder:) Run the script on a folder with a CC-CEDICT file (named “cedict_ts.u8”) to convert it into an XDXF file with the default filename on the same folder. Optionally use one or several of the arguments below.

Arguments

  • -i or --input-file The name (and location if not on the current folder) of the original CC-CEDICT file to be converted. The default is a file named cedict_ts.u8 on the current folder. E.g.:
python cedictxml.py -i NameOfCCedictFile.u8
  • -o or --output-file The name (and location if not on the current folder) of the resulting XDXF file. By default this will be "CC-CEDICT_" follwed by the dictionary version (publishing date-converter version) followed by “.xdxf”. E.g.:
python cedictxml.py -o ~/Dictionaries/XDXFFileName.xdxf
  • -d or --download Automatically download the most recent release of CC-CEDICT and convert it into XDXF. Naturally this argument cannot be used with -i. E.g.:
python cedictxml.py -d

To Do

See TODO

Change Log

See CHANGELOG

Dictionary Files

A recent CC-CEDICT dictionary in XDXF format can be found here.

Limitations

The XDXF format doesn't recognize transcriptions (such as pinyin). The pinyin transcription will be displayed on every entry, as pronunciation, but the entries are not searchable by pinyin. Hopefully this will be implemented in a future version of XDXF.

The XDXF format doesn't recognize different writing systems (such as traditional and simplified Chinese), as such both versions are always displayed, with no indication of which is which or option to display only one. Both versions are searchable, however. Hopefully this will be implemented in a future version of XDXF.

In the CC-CEDICT format there is no separation of words in expressions (each syllable is separated by a space), as such all entries as treated as only one word.

Credits and Licenses

CC-CEDICT is licensed under a under a Creative Commons Attribution-Share Alike 3.0 License and is maintained by MDBG.

pinyin.py (which I'm including in this repository, as the project doesn't seem active anymore) is licensed under a BSD 3-clause license, as part of the pycedict library. It is NOT in the public domain.

CedictXML is not licensed or copyrighted, it is released into the public domain.