From b9248e0124bbc5c20f5bf557a9a029648520f5b7 Mon Sep 17 00:00:00 2001 From: "jannik.stroetgen" Date: Thu, 17 Sep 2015 15:12:48 +0200 Subject: [PATCH] adaptations for HeidelTime 2.0 --- doc/readme.txt | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/doc/readme.txt b/doc/readme.txt index eb0345ae..6135a127 100644 --- a/doc/readme.txt +++ b/doc/readme.txt @@ -198,8 +198,8 @@ set the environment variables. To process English, German, Dutch, Spanish, Italian, French, Chinese or Russian documents, the TreeTaggerWrapper can be used for pre-processing: * Download the TreeTagger and its tagging scripts, installation scripts, as well as - English, German, and Dutch (or any other) parameter files into one directory from: - http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/ + English, German, and Dutch (and all required) parameter files into one directory from: + http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/ - mkdir treetagger - cd treetagger - wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/tree-tagger-linux-3.2.tar.gz @@ -211,6 +211,8 @@ set the environment variables. - wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-par-linux-3.2-utf8.bin.gz - wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-par-linux-3.2-utf8.bin.gz - wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/french-par-linux-3.2-utf8.bin.gz + - wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/portuguese-par-linux-3.2-utf8.bin.gz + - wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/estonian-par-linux-3.2-utf8.bin.gz Attention: If you do not use Linux, please download all TreeTagger files directly from http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/ * (OPTIONAL) For Chinese documents, please get the Tokenizer and TreeTagger parameter file @@ -279,9 +281,9 @@ set the environment variables. You will need to enter the full path of the hunpos-1.0-linux directory in the HunPosTaggerWrapper. - To process any of the automatically create, you can use the AllLanguagesTokenizer - which is part of the heideltime kit. It is a simple (whitespace-based) yet generic - tool and creaetes sentence and token annotation. + To process any of the language with automatically created resources, you can use + the AllLanguagesTokenizer, which is part of the heideltime kit. It is a simple + (whitespace-based) yet generic tool and creaetes sentence and token annotation. For sample UIMA workflows for any of the supported languages, please take a look