POS tagging #34

loaga · 2021-03-20T08:25:41Z

I've tried the following example as input:

      這些語辭都含有高調音

這些(Neqa)　語辭(Na)　都(D)　含有(VJ)　高(VH)　調音(VA)

With customized dictionary, it was able to tag 高調音 as Na.

word_to_weight = {
"高調音": 1,
"土地公": 1,
"土地婆": 1,
"公有": 2,
"": 1,
"來亂的": "啦",
"緯來體育台": 1,
}

word_sentence_list = ws(sentence_list, recommend_dictionary=dictionary)

Is there any code or paper describe how data (token_list.npy, vector_list.np, model_pos, etc) were trained/created?

Thanks.

The text was updated successfully, but these errors were encountered:

emfomy · 2021-03-22T02:00:54Z

Both embeddings are trained using the Word2Vec model from gensim.

Here is the detail of the corpus.

loaga · 2021-03-22T09:44:22Z

Thanks!

On March 21, 2021 at 10:01 PM Mu Yang ***@***.***> wrote: Both embeddings are trained using the Word2Vec model from gensim. Here is the detail of the corpus https://github.com/ckiplab/ckiptagger/wiki/Corpora . — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub #34 (comment) , or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6IED2TAOUPMUCJQ5CKPQTTE2QGJANCNFSM4ZQGLF4Q .

loaga · 2021-03-23T00:13:25Z

On this page, I followed POS tagging link ./data/model_ner/pos_list.txt -> 詞性列表，請見 Wiki / Technical Report no. 93-05 from https://github.com/ckiplab/ckiptagger/wiki/Chinese-README

It mentioned there is a electronic dictionary that include each vocabulary's type (詞性). How get I get access?

Thanks.

Provide feedback