Skip to content

Latest commit

 

History

History
27 lines (14 loc) · 1.43 KB

1508.01991.md

File metadata and controls

27 lines (14 loc) · 1.43 KB

Bidirectional LSTM-CRF Models for sequence tagging, Huang et al., 2015

Paper, Tags: #nlp, #architectures

Our work is the first to apply BI-LSTM-CRF model to NLP benchmark sequence tagging datasets. The model can use past and future input, it can also user sentence level tag information thanks to a CRF layer. It is robust and has less dependence on word embedding as compared to previous observations.

Models

LSTM networks

A RNN maintains a memory based on history information. LSTM networks are the same as RNNs, except that the hidden layer updates are replaced by purpose-built memory cells. The're better at finding and exploiting long range dependencies in the data.

BI-LSTM networks

Used to have access to both past and future input features. We train BI-LSTM networks using backpropagation through time.

CRF networks

They focus on sentence level instead of individual positions, and the inputs and outputs are directly connected

LSTM-CRF networks

The network efficiently uses past input features via a LSTM layer and a sentence level tag information via a CRF layer, which has a state transition matrix (position independent) as parameters, so we can use past and future tags to predict the current tag, similar to the use of past and future input features via a BI-LSTM network.

BI-LSTM-CRF networks

This model can use the future input features as well, boosting tagging accuracy.