Skip to content

11 – Lexicon

Kira edited this page Apr 14, 2019 · 6 revisions

SimpleNLG German contains a large default lexicon parsed from the German Wiktionary, which contains around 100.000 German verbs, nouns, adjectives and adverbs, including irregular inflected forms.

Adding your own lexicon

If you want, you can additionally add your own lexicon containing domain-specific terms. You can integrate it into SimpleNLG in the following way:

    Lexicon lexicon;
    NLGFactory nlgFactory;
    Realiser realiser;

    // Default lexicon provided by SimpleNLG
    Lexicon lexicon1 = Lexicon.getDefaultLexicon();
    // Your additional lexicon - adapt the path
    Lexicon lexicon2 = new XMLLexicon("./src/main/java/simplenlgger/lexicon/additional_lexicon.xml");
    
    this.lexicon = new MultipleLexicon(lexicon1, lexicon2);
    nlgFactory = new NLGFactory(lexicon);
    realiser = new Realiser(lexicon);

Lexicon structure

Below, you can see how SimpleNLG's default lexicon is structured with some sample entries. If you want to add a lexicon with your own words, please use the same XML-tags for your lexicon to work properly. You do not have to specify all tags given in the example, but if you add them, name them consistent to the default lexicon.

    <?xml version='1.0' encoding='UTF-8'?>
    <lexicon>
      <word>
        <base>Besitz</base>
        <id>1</id>
        <category>noun</category>
        <plural>Besitze</plural>
        <genus>m</genus>
        <genitive_sin>Besitzes</genitive_sin>
        <genitive_pl>Besitze</genitive_pl>
        <dative_sin>Besitz</dative_sin>
        <dative_pl>Besitzen</dative_pl>
        <akkusative_sin>Besitz</akkusative_sin>
        <akkusative_pl>Besitze</akkusative_pl>
      </word>
      <word>
        <base>widerspiegeln</base>
        <id>3</id>
        <category>verb</category>
        <regular>True</regular>
        <separable>True</separable>
        <reflexive>True</reflexive>
        <part1>wider</part1>
        <preterite>spiegelte wider</preterite>
        <participle2>widergespiegelt</participle2>
        <firstPerPres>spiegele wider</firstPerPres>
        <secPerPres>spiegelst wider</secPerPres>
        <thirdPerPres>spiegelt wider</thirdPerPres>
      </word>
      </word>
          <base>ausgeprägt</base>
          <id>3</id>
          <category>adjective</category>
          <comp>ausgeprägter</comp>
          <sup>ausgeprägtesten</sup>
        </word>
      </lexicon>

Lexicon values

The lexicon can contain the following fields. Every word, no matter which category, contains the fields base (the word's base form), id (a unique ID), and category (noun, verb, adjective, or adverb). Additionally, for the different word types, different further values which can be added, but don't have to be added.

Nouns

  • plural: The noun's plural form in Nominative
  • genus: The noun's gender (m for masculine, f for feminine, n for neuter)
  • genitive_sin: The noun in genitive singular
  • genitive_pl: The noun in genitive plural
  • dative_sin: The noun in dative singular
  • dative_pl: The noun in dative plural
  • akkusative_sin: The noun in accusative singular
  • akkusative_pl: The noun in accusative plural

Verbs

  • regular: Is the verb regular? (True or False)
  • separable: Is the verb separable? (True or False)
  • reflexive: Is the verb reflexive? (True or False)
  • part1: If the verb is separable, add the separable prefix here
  • preterite: Verb in preterite in 1st person ("I")
  • participle2: Verb in participle II
  • firstPerPres: Verb in present, 1st person ("I")
  • secPerPres: Verb in present, 2nd person ("you")
  • thirdPerPres: Verb in present, 3rd person ("he/she/it")

Verbs completely irregular, like "sein" ("to be"), contain additionally:

  • plFirstThirdPerPres: Verb in present, 1st & 3rd person plural
  • plSecPerPres: Verb in present, 3nd person plural

Adjectives

  • comp: Comparative form
  • sup: Superlative form