Skip to content
/ RODA Public

code and data for TASLP paper: RODA: Reverse Operation based DataAugmentation for Solving Math Word Problems

Notifications You must be signed in to change notification settings

yiyunya/RODA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RODA: Reverse Operation based Data Augmentation for Solving Math Word Problems

Liu Q., Guan W., Li S., Cheng F., Kawahara D. and Kurohashi S.

This paper has been accepted for publication in Transactions on Audio, Speech and Language Processing..

We propose a novel data augmentation method that reverses the mathematical logic of math word problems to produce new high-quality math problems and introduce new knowledge points that can benefit learning the mathematical reasoning logic. We apply the augmented data on two SOTA math word problem solving models and compare our results with a strong data augmentation baseline.

Data

All data used for this paper could be found in /data folder. Augment.json holds all questions augmented from Math23K, which is used for 5-cross validation evaluation. PreprocessedQuestion_enumeratefiltered(split)2.json holds the augmented data for the standard split of Math23K. checkmerge.json holds the data of origin and augmented data for the training set.

All data is saved as:

{'id': '1', 
'origin_id': '946', 
'target_template': ['x', '=', 'temp_c', '*', '(', 'temp_b', '-', 'temp_a', ')', '/', 'temp_a'], 
'target_norm_post_template': ['x', '=', 'temp_c', 'temp_b', 'temp_a', '-', '*', 'temp_a', '/'], 
'num_list': [1.5, 4.0, 12.0], 
'text': '甲数 除以 乙数 的 商是 temp_a , 则 甲数 是 乙 的 temp_b 倍 , 原来 甲数 temp_c , 如果 甲数 增加 =?', 
'answer': 20.0}

Code

Data Augmentation

The augmented data could be obtained by the following code:

python preprocessEnumerate.py --data_path='./data/train23k_processed.json' --out_path='./data/PreprocessedQuestion_enumeratefilteredtrain2.json'

The original data used for augmentation could be found here: https://github.com/SumbeeLei/Math_EN

Reproduction

For reproducing the 5-fold cross-validation results, please run:

python run_check_merge.py

For reproducing the train/test setting, please run:

python run_check_merge.py --test_dir='data/Math_23K_test.json'

The original data used could be found here: https://github.com/ShichaoSun/math_seq2tree

Citation

If you find this repo useful, please cite the following paper:

@misc{liu2020reverse,
      title={Reverse Operation based Data Augmentation for Solving Math Word Problems}, 
      author={Qianying Liu and Wenyu Guan and Sujian Li and Fei Cheng and Daisuke Kawahara and Sadao Kurohashi},
      year={2020},
      eprint={2010.01556},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

code and data for TASLP paper: RODA: Reverse Operation based DataAugmentation for Solving Math Word Problems

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages