-
Notifications
You must be signed in to change notification settings - Fork 1
11 – Lexicon
SimpleNLG German contains a large default lexicon parsed from the German Wiktionary, which contains around 100.000 German verbs, nouns, adjectives and adverbs, including irregular inflected forms.
If you want, you can additionally add your own lexicon containing domain-specific terms. You can integrate it into SimpleNLG in the following way:
Lexicon lexicon;
NLGFactory nlgFactory;
Realiser realiser;
// Default lexicon provided by SimpleNLG
Lexicon lexicon1 = Lexicon.getDefaultLexicon();
// Your additional lexicon - adapt the path
Lexicon lexicon2 = new XMLLexicon("./src/main/java/simplenlgger/lexicon/additional_lexicon.xml");
this.lexicon = new MultipleLexicon(lexicon1, lexicon2);
nlgFactory = new NLGFactory(lexicon);
realiser = new Realiser(lexicon);
Below, you can see how SimpleNLG's default lexicon is structured with some sample entries. If you want to add a lexicon with your own words, please use the same XML-tags for your lexicon to work properly. You do not have to specify all tags given in the example, but if you add them, name them consistent to the default lexicon.
<?xml version='1.0' encoding='UTF-8'?>
<lexicon>
<word>
<base>Besitz</base>
<id>1</id>
<category>noun</category>
<plural>Besitze</plural>
<genus>m</genus>
<genitive_sin>Besitzes</genitive_sin>
<genitive_pl>Besitze</genitive_pl>
<dative_sin>Besitz</dative_sin>
<dative_pl>Besitzen</dative_pl>
<akkusative_sin>Besitz</akkusative_sin>
<akkusative_pl>Besitze</akkusative_pl>
</word>
<word>
<base>widerspiegeln</base>
<id>3</id>
<category>verb</category>
<regular>True</regular>
<separable>True</separable>
<reflexive>True</reflexive>
<part1>wider</part1>
<preterite>spiegelte wider</preterite>
<participle2>widergespiegelt</participle2>
<firstPerPres>spiegele wider</firstPerPres>
<secPerPres>spiegelst wider</secPerPres>
<thirdPerPres>spiegelt wider</thirdPerPres>
</word>
</word>
<base>ausgeprägt</base>
<id>3</id>
<category>adjective</category>
<comp>ausgeprägter</comp>
<sup>ausgeprägtesten</sup>
</word>
</lexicon>
The lexicon can contain the following fields. Every word, no matter which category, contains the fields base (the word's base form), id (a unique ID), and category (noun, verb, adjective, or adverb). Additionally, for the different word types, different further values which can be added, but don't have to be added.
- plural: The noun's plural form in Nominative
- genus: The noun's gender (m for masculine, f for feminine, n for neuter)
- genitive_sin: The noun in genitive singular
- genitive_pl: The noun in genitive plural
- dative_sin: The noun in dative singular
- dative_pl: The noun in dative plural
- akkusative_sin: The noun in accusative singular
- akkusative_pl: The noun in accusative plural
- regular: Is the verb regular? (True or False)
- separable: Is the verb separable? (True or False)
- reflexive: Is the verb reflexive? (True or False)
- part1: If the verb is separable, add the separable prefix here
- preterite: Verb in preterite in 1st person ("I")
- participle2: Verb in participle II
- firstPerPres: Verb in present, 1st person ("I")
- secPerPres: Verb in present, 2nd person ("you")
- thirdPerPres: Verb in present, 3rd person ("he/she/it")
Verbs completely irregular, like "sein" ("to be"), contain additionally:
- plFirstThirdPerPres: Verb in present, 1st & 3rd person plural
- plSecPerPres: Verb in present, 3nd person plural
- comp: Comparative form
- sup: Superlative form