-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding morphology #65
Comments
I prefer the first option as that is consistent with what we do for pos and lemma. Then maybe what we need is a way to describe the contents of complex feature values. For example, maybe the |
Binding |
Having a layered annotations for An example could be: {
"annotations": [
{
"@type": "Token",
"id": "t1",
"start":0 ,
"end": 2,
"features": {
"word": "im"
}
},
{
"@type": "Morph",
"id": "m1",
"target": "t1",
"features": {
"list_of_morphemes": ["mor1", "mor2"],
"person": "m",
"case": "dative",
...
}
},
{
"@type": "Morpheme",
"id": "mor1",
"features": {
"lemma": "in"
}
},
{
"@type": "Morpheme",
"id": "mor1",
"features": {
"lemma": "dem"
}
}
]
} |
Some elaboration on the previous comment... Having a Morphology annotation type allows us to associate morphological features with any other annotation or set of annotations (because target can refer to more than one element). And a morphemes feature on Morphology allows levels of Morphological analysis since you can have Morphology annotations with all their features pointing at morphemes. As noted above, having those levels can deal with the "im == in dem" problem (the multi-word tokens from UD), as long as we are willing to live with calling "in" and "dem" morphemes, which I find moderately disturbing. Having two annotation types agrees with WebLicht's TCF morphology tag which has an analysis part and a segmentation part (although it is not quite clear to me how the segmentation is used) and the DKPro type system (although there Morpheme has no description). The mockup for the next vocabulary at http://vocab.lappsgrid.org/1.3.0-SNAPSHOT/ has Morphology as a subtype of Region. This is consistent in that we want a morphology to point to annotations via targets, but we vaguely hint that the targets should be a contiguous sequence. I see two problems there: (1) we may want a morphology to point at discontinuous annotations, and (2) some of the current uses of targets do have gaps between the individual annotations (for example the spaces between tokens in a sentence). Two potential solutions: (1) allow a Region to be discontinuous, (2) make Morphology a subtype of Annotation (like we do with Coreference, PhraseStructure and DependencyStructure, although issue #64 is suggesting to move those last two to Region). Also in http://vocab.lappsgrid.org/1.3.0-SNAPSHOT/ we have a morph attribute on Token and the value of that attribute is a Morphology annotation. This may be a problem for two reasons:
Finally, where in the hierarchy do we want to put Morpheme? Region seems to make some sense but in some cases there is no clear offsets that we can associate the Morpheme with and all we can do is have a target which will point out a wider region. Maybe we can have Morpheme point back to the Morphology that it is part of? |
If I remember correctly, the In order to address things such as
The idea is that we create two tokens on Mind that this is so far only a concept and has not been (fully) implemented yet. We do have the |
Yeah, WebLicht had a similar proposed solution and we have pondered several additions to Token which are also similar to yours. We had a few minor misgivings with that which I will try to remember. |
We have no place for morphology except by putting arbitrary attributes in the features dictionary.
Two options:
Add
morphology
as a property toToken
with as its value a map. The disadvantage is that we have no way of specifying what features we want to use. The advantage is that this is the simplest way and that it does not increase the number of types n the vocabulary. In addition, the token seems to be a natural place to express this.Add a
Morphology
annotation type which has an identifier that points at aToken
or any other annotation type. Will allow us to specify what the morphological features are but at the cost of added complexity in he vocabulary. This is the approach that many others wanted us to use for parts-of-speech.The text was updated successfully, but these errors were encountered: