Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POS tagging #34

Open
loaga opened this issue Mar 20, 2021 · 3 comments
Open

POS tagging #34

loaga opened this issue Mar 20, 2021 · 3 comments

Comments

@loaga
Copy link

loaga commented Mar 20, 2021

I've tried the following example as input:

      這些語辭都含有高調音

這些(Neqa) 語辭(Na) 都(D) 含有(VJ) 高(VH) 調音(VA)

With customized dictionary, it was able to tag 高調音 as Na.

word_to_weight = {
"高調音": 1,
"土地公": 1,
"土地婆": 1,
"公有": 2,
"": 1,
"來亂的": "啦",
"緯來體育台": 1,
}

word_sentence_list = ws(sentence_list, recommend_dictionary=dictionary)

Is there any code or paper describe how data (token_list.npy, vector_list.np, model_pos, etc) were trained/created?

Thanks.

@emfomy
Copy link
Member

emfomy commented Mar 22, 2021

Both embeddings are trained using the Word2Vec model from gensim.

Here is the detail of the corpus.

@loaga
Copy link
Author

loaga commented Mar 22, 2021 via email

@loaga
Copy link
Author

loaga commented Mar 23, 2021

On this page, I followed POS tagging link ./data/model_ner/pos_list.txt -> 詞性列表,請見 Wiki / Technical Report no. 93-05 from https://github.com/ckiplab/ckiptagger/wiki/Chinese-README

It mentioned there is a electronic dictionary that include each vocabulary's type (詞性). How get I get access?

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants