-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tagset for GOST tagger #89
Comments
Probably CLAWS6 or CLAWS8, but need to look at more tags. Note that YCOL and YDSH are punctuation tags and that CLAWS7 is CLAWS6 minus punctuation tags.
|
In addition, beyond the pos tags, GOST also produces semantic tags from the 200+ basic semantic tags from the UCREL Semantic Analysis System (USAS, http://ucrel.lancs.ac.uk/usas/) as well as identifiers from the GO ontology. The GOST service uses a list-valued Because we have two tagsets for the same property, we need to define this in the metadata a bit differently from existing tag set definitions where we just give a URI, for example for the value of posTagSet on Token we can use a URI pointing to a tag set discriminator in the vocabulary. Now that we have both USAS types and GO categories in the semtags property, we need to be able to say that in the metadata
So in the metadata we can say: { "contains": {
"http://vocab.lappsgrid.org/Token": {
"semanticTags": [ "tags-sem-bio-go", "tags-sem-basic-asus" ] }}} For the full names I am proposing one of the following:
I think I prefer the last one because the number of different set of semantic tags may be impressive. |
For the full names we are now leaning towards not creating a subdirectories http://vocab.lappsgrid.org/ns/tagset/sem, so we would get something like
|
Keith and I discussed what to do about semantic tags (not entirely related to the below, I think)—we decided on a new view (layer) called SemanticTag, which could also be used for sense tags etc.
What is asus?
… On May 2, 2019, at 11:50 AM, marcverhagen ***@***.***> wrote:
For the full names we are now leaning towards not creating subdirectories for http://vocab.lappsgrid.org/ns/tagset/sem <http://vocab.lappsgrid.org/ns/tagset/sem>, so we would get something like
name url
tags-sem-asus http://vocab.lappsgrid.org/ns/tagset/sem#asus <http://vocab.lappsgrid.org/ns/tagset/sem#asus>
tags-sem-bio-go http://vocab.lappsgrid.org/ns/tagset/sem#bio-go <http://vocab.lappsgrid.org/ns/tagset/sem#bio-go>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#89 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA7M3P26OVWSX5HGUIQA7RTPTMEUXANCNFSM4HHEDS5Q>.
-----------------------------------------------------
Nancy Ide
Professor of Computer Science
Department of Computer Science
Vassar College
Poughkeepsie, New York 12604-0520
USA
tel: (+1 845) 437 5988
fax: (+1 845) 437 7498
email: ide@cs.vassar.edu
http://www.cs.vassar.edu/~ide
-----------------------------------------------------
|
The asus discriminator refers to the 200+ semantic tags used by the UCREL Semantic Analysis System (USAS), and they are in the GOST output. |
I am adding CLAWS tag sets to the vocabulary (so far just CLAWS5 and CLAWS7), but it is not clear what the GOST tagger is using. It is clearly not CLAWS five, as shown in the table below for the tags that appear in the GOST output for MASC3-0203 (which by the way only gives 23 tokens), but CLAWS7 isn't it either.
Must look into the other CLAWS tag sets.
The text was updated successfully, but these errors were encountered: