Skip to content

xiuheying/Zpar-ChineseWordSegmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zpar-ChineseWordSegmentation

This is an example script to implement chinese word segmentation by running the code of ZPar

Please follow the below steps:
(1) Download the Source code of ZPar to the directory ZPar.
(2) Run command cmake . in the directory ZPar/CMake to generate Makefile
(3) Compile using make segmentor in the directory ZPar/Make
       After compiling,a directory ZPar/dist/segmentor will be created,in which there are two files:train and segmentor. The file train is used to train a segmentation model,and the file segmentor is used to segment new texts using a trained segmentation model.
(4) To train a model,type:
       ZPar/dist/segmentor/train <train-file> <model-file> <number of iterations>
       For example,ZPar/dist/segmentor/train pku.train model 1
(5) To apply an existing model to segment new texts,type:
       ZPar/dist/segmentor/segmentor <model> <input-file> <output-file>
       For example,ZPar/dist/segmentor/segmentor model pku.test pku.test.output
(6) Suppose a maunally specified segmentation of the input file has been given in a reference file,you can evaluate the quality of the outputs by typing:
       python ZPar/doc/doc/seg_files/evaluate.py <output-file> <reference-file>
       For example,python ZPar/doc/doc/seg_files/evaluate.py pku.test.output pku.test.reference
       The file evaluate.py performs automatic evaluation .You can find the precision,recall,and f-score here


       The performance of the system after one training iteration may not be optimal. You can choose the model that gives the highest f-score on your development test data by running ./test.sh in the directory ZPar/doc/doc/seg_files,which can automatically train the segmentor for 30 iterations, and after the ith iteration, stores the model file to model.i. You can compare the f-score of all 30 iterations and choose model.k, which gives the best f-score, as the final model.In this file, there is a variable called segmentor. You need to set this variable to the relative directory of ZPar/dist/segmentor.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published