An unofficial implementation of soft-masked bert by pytorch and transformers, welcome to issue and PR.
Paper: Spelling Error Correction with Soft-Masked BERT
Train data: Baidu Cloud (password:l545)
Pretrained bert model: Chinese-BERT-wwm
You can add custom data following the data format:
{'text': '但是我不能去参加,因为我有一点事情阿!', 'mistakes': [{'wrong': '阿', 'correct': '啊', 'loc': '18'}]}
You can train or inference the model by train.py
python train.py