Concept

soruce : NLP || Christopher Manning || Stanford L43 ~ L46

L43 Information Extraction and Named Entity Recognition

Information Extraction (I.E) :

Find and understand limit relavant part of texts.
Gather information from many pieces of text.
Produces a structured representation of relevant infomation

The Goal of I.E.

Organize infomation so that it is useful to people
Put information in sematically precise form that allows further inference to be made by computer algorithm.

Apple Mail knows there is a "date", so it recommend you to create a New calendar event.

It's easy, just use regular expression and name lists.

Google knows it's a location when you searching it.

Named Entity Recognition (NER)

find them and classifiy what it is!(person/date/location/orgaization)

if you have a good NER system, you 'll do a good job in question answering.(it always asking who did what at when )

L44 Evaluation of Named Entity Recognition

ORG : ogrinaztion O : Out of ner PER : person

left : per token(word), right : ner type(entity)

you may think it is a token(word) classification task, but we are interest in the entity, So the standard evaluation is per entity not per token.

The boundary error

we cannot use classfication metrics for ner task. it will get wired situation

so our y label shoud contain the boundary.

ground truth : ORG(1-4) prediction : ORG(2-4)

due to the "first" is not being predicted, it counts flase negtive, and the "back of chicago" is not fully correct, it counts false positive!

so select nothing would have been better.

otherwise you need to pick other metrics give partital credit.

So in common ner task. we use F1 score to measure the performance. (due to MUC scorer might be complex and not straight-forward.)

L45 Sequence Models for Named Entity Recogintion

The ML sequence model approach to NER

If your entity is too hard to capture(easy thing like date, time, fix named list). then you can do a ML approach.

labelling

IO encoding : Inside / Outside

There is some problem in IO encoding labelling method.

Sue is 1 person name, Mengqiu is another person name.

But the labelling will tell machine : a person called Sue Mengqiu Huang.

So we have another labeling method.

IOB encoding

I : inner O : other B : beginning

It's will be great that we label in this way. And it comes a bit of costs.

If we have $C$ entity class.

We'll have labels $C+1$ for IO encoding. and $2C+1$ for IOB encoding.

In this case, $C=1$(person). $1$ for "other".

for IO encoding. the prediction will run faster, but less accurate.

This course will use IO encoding. due to the different people are adjacent will be very rare to happen. the IO encoding run faster.

And an itresting thing is : In practice, if we use IOB encoing, the system still get it wrong because the adjacent is too rare to happen. So you can find the IOB encoing dataset still encode B-PER, I-PER, I-PER for Sue Mengqiu Huang(even it's 2 people)

Features

POS-Tagging - 詞性標注

word shape is kind of reguar expression. it's a powerful feature.

L46 Maximum Entropy Sequence Models

Maximum entropy markov models (MEMMs) or Conditional Markov models.

word segmentation -> 斷詞

text segmentation -> what part is questions, what part is answer?(Q and A as label)

MEMM Inference

additional

nltk all pos tags

DT - 限定詞 NNP - ? VBD - 動詞，過去式

A larger space of sequences is usually explored via search.

Inference in sequence model :

make decision at the point based on conditional evidence(from observation and previous decision)

But we can't use all of the sequence data(it will be too big). so we need to use local(smaller) data to make decision.

On above figure, we have a greedy search inference model(always pick best decision based on small window ground truth and features)

the greedy search approach works well in practice. but it is not the optimal decision.

Sometimes the decision based on prevoius word(which is not the strong signal at the previous decision point.)

Then we have another search method. Beam search.

Beam Inference

Instead of keep top 1 most likely label for each position, we keep the top k most likely labels.

In practial, k = 3 ~ 5 helps a lot.(but not all of the case)

Beam Search still not a global optimal. but an approximate search method.

Viterbi Inference

we can actually find the best sequence of states that has the globally highest score in the model.

CRFs (Conditional Random Fields)

https://ithelp.ithome.com.tw/articles/10208587

Inference of CTC https://www.ycc.idv.tw/crnn-ctc.html

Note

In nowday, we use attention NN to capture long distance word interaction. which above the method cannot solve it very will.

check

24. 深度學習甜點系列：需要專注力的機械翻譯員

other resource

NLP中的 POS Tagging 和Chunking

pos tagging : 詞性標註 chunking : 短動詞，短名詞等，可以藉由pos tagging接續做

Case Study I - 洗錢防制模型 - 玉山挑戰賽

webpage

post - 1

code

task :

identify AML news.(document classficiation)
get AML list in the news.(NER)

language : chinese

dataset : news url, contain_aml, aml_name_list

used model :

identify AML news (AML classifier) :

jieba + countVectorizer / tfidf + Multinomial Naive Bayes
BERT chinese embedding + BiLSTM_CRF
BERT chinese embedding + BiLSTM_CRF + Rule-based name list

BERT chinese + BiLSTM_CRF : F1-Score 0.728 Keyword List : 0.746

combined appraoch : F1-Score 0.92

AML NER Model :

BERT chinese at setence level
BIO encoding(BIO, BIOSE, IOB, BILOU ...)
post-processing(Tabu-list)

highlights :

chinese document data cleaning
hybird rule-based and ml-based approach.

NER Hello world!

https://github.com/YLTsai0609/bert_ner

https://github.com/GitYCC/bert-minimal-tutorial/blob/master/notebooks/chinese_ner.ipynb

Labelling

Case Study II - PoI extraction on the Internet.

http://www.aclclp.org.tw/clclp/v19n4/v19n4a1.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ner_intro.md

ner_intro.md

Concept

L43 Information Extraction and Named Entity Recognition

Named Entity Recognition (NER)

L44 Evaluation of Named Entity Recognition

The boundary error

L45 Sequence Models for Named Entity Recogintion

The ML sequence model approach to NER

IO encoding : Inside / Outside

IOB encoding

Features

L46 Maximum Entropy Sequence Models

MEMM Inference

additional

Beam Inference

Viterbi Inference

CRFs (Conditional Random Fields)

Note

other resource

Case Study I - 洗錢防制模型 - 玉山挑戰賽

NER Hello world!

Labelling

Case Study II - PoI extraction on the Internet.

Files

ner_intro.md

Latest commit

History

ner_intro.md

File metadata and controls

Concept

L43 Information Extraction and Named Entity Recognition

Named Entity Recognition (NER)

L44 Evaluation of Named Entity Recognition

The boundary error

L45 Sequence Models for Named Entity Recogintion

The ML sequence model approach to NER

IO encoding : Inside / Outside

IOB encoding

Features

L46 Maximum Entropy Sequence Models

MEMM Inference

additional

Beam Inference

Viterbi Inference

CRFs (Conditional Random Fields)

Note

other resource

Case Study I - 洗錢防制模型 - 玉山挑戰賽

NER Hello world!

Labelling

Case Study II - PoI extraction on the Internet.