-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semantic tags #92
Comments
I added the semtag feature on tokens to get something online quickly so @nancyide could test the GOST service for a paper she is working on. I am not convinced Token is a good place for them, but that is what GOST does so that is what I went with for the demo. Nancy has since asked that I annotate items tagged with values from the Gene Ontology as "NamedEntities" with @category=Gene, however that is far from ideal as the GO contains labels for many things other than gene names. For example, "learning" (a biological process) gets labelled as "gene" which is obviously not correct. Perhaps for things like GO tags or WordNet synsets we could repurpose the unused
|
Hmmmm, I like that reuse of something we do not use. The name might not be so great though because the result may come from some mechanism that is not a look up. And we would basically use it as a Tag type without calling it Tag. |
Is a SemanticTag just one tag or does it contain a list? This is one of the issues from #95). Assume an Annotation ("endoplasmic reticulum") that has two semantic tags, both GO tags (GO:0044464.3.N and GO:0043229.2.N). Here is an example with one tag per SemanticTag. First the metadata (recall that we decided agains the likes of
Now the tags as one tag per SemanticTag:
Now the tags as multiple tags per SemanticTag, borrowing from Keith's comment above:
We can quarrel about what the best feature name is ( The second is more compact. The advantage of the first is the flexibility it affords if the tag is not just a singe label but has all kinds of other features (confidence, sub-label. It would also be easier to deal with the issue below. UPDATE. There is an added benefit for the multiple tags, which is that you automatically group tags, for example, GOST assigns a bunch of GO categories to a region and when you have multiple SemanticTag instances for that list you need a mechanism to show the grouping. Keith prefers using multiple tags since it would be closer to what GOST does and GOST so far is the only semantic tagger we have wrapped. There is the issue of what happens when tags are not just a neat label, but come with other information (confidence score, sub types, etcetera). We would now need to maintain some lists or maps:
Also see the comment Do we need a Token#semtags property? below for some related prose including on containers. |
How to deal with multiple tag sets Another issue from #95). We thought there were two options. Option 1. One view and using a local
For the first we have the advantage that the tag sets are in the metadata, but the disadvantage that we make the value a list, which is different from other tag sets around the vocabulary. With using the default we do not have to make the value a list and we could omit the local tagSet feature for the default set. I find the first conceptually a bit cleaner. Note that we may want to add a dependOn feature as well, which is not actually in the vocabulary yet. Now the annotations. Assume an Annotation ("endoplasmic reticulum") that has two semantic tags, one GO tag (GO:0044464.3.N) and one USAS tag category (C1, "substances and materials generally").
With the GO set as the default, the first semantic tag could leave out the tagSet property. Note that in either case (list of sets or single set), the local tagSet attribute is only needed if we have more tag sets, which is actually explicit in the list case. Option2. Two views. Well. no examples here, this should be obvious. The advantage is simplicity of views, the disadvantage is extra views for these cases. UPDATE. After discussion with Keith, we came up with something we like better for option 1. We like the list value for the
The prefix is only used when there are multiple tagsets and would be a suffix of the two discriminators, picking a suffix that discriminates, in this case just |
Do we need a Token#semtags property? (Another issue from #95). We don't since you can easily trace the tag to the token , but do we want it? The question is whether the token should be directly aware whether it has semantic tags. There is precedent here. A PhraseStructure annotation nows what constituents it has. It is probably nice to have the tags available in a UPDATE. We realized this property might be a nice place to group tags. If we use multiple instances of SemanticTag to encode a list of tags as given by a tool like GOST, we lose the connection between those tags. With Example of a list of annotations:
We could have some container types like this (List, Set, Map). We may or may not want them to be listed in the vocabulary explicitly. |
We are about to add some discriminators for semantic category sets that cannot be interpreted as named entity category sets (GO categories for example, and WordNet synsets would fall in this category as well). And we are about to add
semtags
as a property on Token since we use that for the GOST tagger, and that new tag would be similar topos
except that we allow a list.With semantic tags however we need to deal with the situation where the tag is not on a token but on a phrase. Do we add Chunk to the vocabulary with some generic properties including
semtags
(also see issue #90)? One thing not to like about that is thatsemtags
would then be defined at two spots. Do we want to consider a Semantics type along a Morphology type?The text was updated successfully, but these errors were encountered: