Zpar-ChineseWordSegmentation

This is an example script to implement chinese word segmentation by running the code of ZPar

Please follow the below steps:
(1) Download the Source code of ZPar to the directory ZPar.
(2) Run command cmake . in the directory ZPar/CMake to generate Makefile
(3) Compile using make segmentor in the directory ZPar/Make
After compiling,a directory ZPar/dist/segmentor will be created,in which there are two files:train and segmentor. The file train is used to train a segmentation model,and the file segmentor is used to segment new texts using a trained segmentation model.
(4) To train a model,type:
ZPar/dist/segmentor/train <train-file> <model-file> <number of iterations>
For example,ZPar/dist/segmentor/train pku.train model 1
(5) To apply an existing model to segment new texts,type:
ZPar/dist/segmentor/segmentor <model> <input-file> <output-file>
For example,ZPar/dist/segmentor/segmentor model pku.test pku.test.output
(6) Suppose a maunally specified segmentation of the input file has been given in a reference file,you can evaluate the quality of the outputs by typing:
python ZPar/doc/doc/seg_files/evaluate.py <output-file> <reference-file>
For example,python ZPar/doc/doc/seg_files/evaluate.py pku.test.output pku.test.reference
The file evaluate.py performs automatic evaluation .You can find the precision,recall,and f-score here

The performance of the system after one training iteration may not be optimal. You can choose the model that gives the highest f-score on your development test data by running ./test.sh in the directory ZPar/doc/doc/seg_files,which can automatically train the segmentor for 30 iterations, and after the ith iteration, stores the model file to model.i. You can compare the f-score of all 30 iterations and choose model.k, which gives the best f-score, as the final model.In this file, there is a variable called segmentor. You need to set this variable to the relative directory of ZPar/dist/segmentor.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
README.md		README.md
evaluate.py		evaluate.py
pku.dev		pku.dev
pku.dev.reference		pku.dev.reference
pku.test		pku.test
pku.test.reference		pku.test.reference
pku.train		pku.train
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zpar-ChineseWordSegmentation

About

Releases

Packages

Languages

xiuheying/Zpar-ChineseWordSegmentation

Folders and files

Latest commit

History

Repository files navigation

Zpar-ChineseWordSegmentation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages