Hyperparamter tuning on CoNLL 2003 English NER task

If you use large batch (e.g. batch_size > 100), you'd better set avg_batch_loss=True to get a stable training process. For small batch size, avg_batch_loss=True will converge faster and sometimes gives better performance (e.g. CoNLL 2003 NER).
You can get better performance on the CoNLL 2003 English dataset if you use 100-d pretrained word vectors here instead of 50-d pretrained word vectors.
If you want to write a script to tune hyperparameters, you can use the main_parse.py to set hyperparameters in command line arguements.
Model performance is sensitive with lr which needs to be carefully tuned under different structures:
- Word level LSTM models (e.g. char LSTM + word LSTM + CRF) would prefer a lr around 0.015.
- Word level CNN models (e.g. char LSTM + word CNN + CRF) would prefer a lr around 0.005 and with more iterations.
- You can refer the COLING paper "Design Challenges and Misconceptions in Neural Sequence Labeling" for more hyperparameter settings.

Provide feedback