TCM-BERT

The implementation of TCM-BERT in our paper:

Liang Yao, Zhe Jin, Chengsheng Mao, Yin Zhang and Yuan Luo. "Traditional Chinese Medicine Clinical Records Classification with BERT and Domain Specific Corpora." Journal of the American Medical Informatics Association (JAMIA). Volume 26, Issue 12, December 2019, Pages 1632–1636, https://doi.org/10.1093/jamia/ocz164

The repository is modified from pytorch-pretrained-BERT and tested on Python 3.5+.

Installing requirement packages

pip install -r requirements.txt

Data

The Copyright holder of the dataset is China Knowledge Centre for Engineering Sciences and Technology (CKCEST). The dataset is for research use only. Any commercial use, sale, or other monetization is prohibited.

Training, validation and test records are in ./TCMdata/train.txt, ./TCMdata/val.txt and ./TCMdata/test.txt

Six example external unlabeled clinical records are in ./TCMdata/domain_corpus.txt. Due to CKCEST policy, we could not provide all 46,205 records. But we provide our fine-tuned models.

The fine-tuned model from the second step is here. The final fine-tuned text classifier is here.

How to run

1. Language model fine-tuning

python3 simple_lm_finetuning.py 
--train_corpus ./TCMdata/domain_corpus.txt 
--bert_model bert-base-chinese 
--do_lower_case 
--output_dir finetuned_lm_domain_corpus/ 
--do_train

2. Final text classifier fine-tuning

python3 run_classifier.py 
--do_eval 
--do_predict 
--data_dir ./TCMdata 
--bert_model bert-base-chinese 
--max_seq_length 400 
--train_batch_size 32 
--learning_rate 2e-5 
--num_train_epochs 3.0 
--output_dir ./output 
--gradient_accumulation_steps 16 
--task_name demo  
--do_train 
--finetuned_model_dir ./finetuned_lm_domain_corpus

Reproducing results

Downloading the fine-tuned language model.
Uncompressing the zip file in current folder.
Running the final text classifier fine-tuning.
The results of BERT can be reproduced by running the final text classifier fine-tuning without --finetuned_model_dir ./finetuned_lm_domain_corpus.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
TCMdata		TCMdata
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_classifier.py		run_classifier.py
simple_lm_finetuning.py		simple_lm_finetuning.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TCM-BERT

Installing requirement packages

Data

How to run

1. Language model fine-tuning

2. Final text classifier fine-tuning

Reproducing results

About

Releases

Packages

Languages

License

yao8839836/tcm_bert

Folders and files

Latest commit

History

Repository files navigation

TCM-BERT

Installing requirement packages

Data

How to run

1. Language model fine-tuning

2. Final text classifier fine-tuning

Reproducing results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages