Paper, Tags: #nlp, #architectures
Our work is the first to apply BI-LSTM-CRF model to NLP benchmark sequence tagging datasets. The model can use past and future input, it can also user sentence level tag information thanks to a CRF layer. It is robust and has less dependence on word embedding as compared to previous observations.
A RNN maintains a memory based on history information. LSTM networks are the same as RNNs, except that the hidden layer updates are replaced by purpose-built memory cells. The're better at finding and exploiting long range dependencies in the data.
Used to have access to both past and future input features. We train BI-LSTM networks using backpropagation through time.
They focus on sentence level instead of individual positions, and the inputs and outputs are directly connected
The network efficiently uses past input features via a LSTM layer and a sentence level tag information via a CRF layer, which has a state transition matrix (position independent) as parameters, so we can use past and future tags to predict the current tag, similar to the use of past and future input features via a BI-LSTM network.
This model can use the future input features as well, boosting tagging accuracy.