- If you use large batch (e.g. batch_size > 100), you'd better set
avg_batch_loss=True
to get a stable training process. For small batch size,avg_batch_loss=True
will converge faster and sometimes gives better performance (e.g. CoNLL 2003 NER). - You can get better performance on the CoNLL 2003 English dataset if you use 100-d pretrained word vectors here instead of 50-d pretrained word vectors.
- If you want to write a script to tune hyperparameters, you can use the
main_parse.py
to set hyperparameters in command line arguements. - Model performance is sensitive with
lr
which needs to be carefully tuned under different structures:- Word level LSTM models (e.g. char LSTM + word LSTM + CRF) would prefer a
lr
around 0.015. - Word level CNN models (e.g. char LSTM + word CNN + CRF) would prefer a
lr
around 0.005 and with more iterations. - You can refer the COLING paper "Design Challenges and Misconceptions in Neural Sequence Labeling" for more hyperparameter settings.
- Word level LSTM models (e.g. char LSTM + word LSTM + CRF) would prefer a