Skip to content

Commit

Permalink
added some more information in the readme
Browse files Browse the repository at this point in the history
  • Loading branch information
jzell committed Apr 17, 2013
1 parent 5fed6e0 commit 1d2fdfa
Showing 1 changed file with 58 additions and 19 deletions.
77 changes: 58 additions & 19 deletions doc/readme.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
# UIMA HEIDELTIME KIT #
#######################

Author: Jannik Strötgen
Author: Jannik Strötgen, Julian Zell
Date: April 17, 2013
eMail: [email protected]
eMail: [email protected], [email protected]

###################################
# 1. Papers describing HeidelTime #
Expand Down Expand Up @@ -46,11 +46,15 @@ simply develop resources for additional languages using HeidelTime's well-define
HeidelTime was the best system for the extraction and normalization of English temporal
expressions from documents in the TempEval-2 challenge in 2010. Furthermore, it is evaluated on
several additional corpora, as described in our paper "Multilingual Cross-domain Temporal
Tagging" (http://www.springerlink.com/content/64767752451075k8/).
Tagging" (http://www.springerlink.com/content/64767752451075k8/). In TempEval-3, HeidelTime
achieved the best results for the combination of extraction and normalization for English and
Spanish.

HeidelTime with resources for English and German is one component of our UIMA HeidelTime kit.
Furthermore, resources for Dutch were developed and kindly provided by Matje van de Camp
(Tilburg University, http://www.tilburguniversity.edu/webwijs/show/?uid=m.m.v.d.camp).
(Tilburg University, http://www.tilburguniversity.edu/webwijs/show/?uid=m.m.v.d.camp). As of
version 1.3, HeidelTime additionally contains resources for Spanish, Italian, Arabic, and
Vietnamese.

Additionally, whilst expanding the set of domains that HeidelTime can recognize temporal
expressions in, English resources for colloquial as well as scientific style documents were
Expand All @@ -73,23 +77,46 @@ contains:
and the evaluation sets and annotates the document creation time as well as token and
sentence information.

* TempEval-3 Reader: This Collection Reader reads the TempEval-3 input TimeML data of the
training and the evaluation sets and annotates the document creation time as well as some
meta information.

* TreeTaggerWrapper: This Analysis Engine produces Token, Sentence and Part-of-Speech annotations
required by HeidelTime by using the language independent TreeTagger tool.

* HeidelTime
* StanfordPOSTaggerWrapper: This Analysis Engine produces Token, Sentence and Part-of-Speech
annotations required by HeidelTime by using the language independent Stanford POS Tagger.

* JVnTextProWrapper: This Analysis Engine produces Token, Sentence and Part-of-Speech annotations
required by HeidelTime by using the language independent JVnTextPro tool for documents
in Vietnamese.

* Annotation Translator: This Analysis Engine translates Sentence, Token, and Part-of-Speech
annotations of one type system into HeidelTime's type system.

* HeidelTime: Possible parameter values are:
- languages: english, englishcoll, englishsci, german, spanish, italian, vietnamese, arabic,
dutch
- types: news, narratives, colloquial (for use with englishcoll), scientific (englishsci)
- locale: the locale to use for date calculation. Leave it empty to use en_GB.
- Debugging: to output verbose debugging information to stderr.

* IntervalTagger: This Analysis Engine in conjunction with HeidelTime recognizes
temporal intervals in documents.

* ACE Tern Writer: This CAS Consumer creates output data as needed to run the official ACE
Tern evaluation scripts.

* TempEval-2 Writer: This CAS Consumer creates two files needed to evaluate the tasks of
extracting and evaluating temporal expressions of the TempEval-2 challenge using the
official evaluation scripts.

* Supported languages right now are: german, english, dutch, arabic, vietnamese, spanish,
italian.
* TempEval-3 Writer: This CAS Consumer writes annotated TimeML files that were read in by
the TempEval-3 Reader and processed by HeidelTime in the format required by the TempEval-3
evaluation scripts.

* Languages supported by HeidelTime as of version 1.3 are:
german, english, dutch, arabic, vietnamese, spanish, italian.

######################
# 3. Getting started #
Expand Down Expand Up @@ -135,15 +162,17 @@ set the environment variables.
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/
- mkdir treetagger
- cd treetagger
- wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/tree-tagger-linux-3.2.tar.gz
- wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/tagger-scripts.tar.gz
- wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/install-tagger.sh
- wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/german-par-linux-3.2.bin.gz
- wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/german-par-linux-3.2-utf8.bin.gz
- wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/english-par-linux-3.1.bin.gz
- wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/dutch-par-linux-3.1.bin.gz
- wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/tree-tagger-linux-3.2.tar.gz
- wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/tagger-scripts.tar.gz
- wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/install-tagger.sh
- wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/german-par-linux-3.2.bin.gz
- wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/german-par-linux-3.2-utf8.bin.gz
- wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/english-par-linux-3.2.bin.gz
- wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-par-linux-3.1.bin.gz
- wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-par-linux-3.2-utf8.bin.gz
- wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-par-linux-3.2-utf8.bin.gz
Attention: If you do not use Linux, please download all TreeTagger files directly from
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/
http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/
* Install the TreeTagger
- sh install-tagger.sh
* Set environment variables (you can set variables globally, e.g., in your $HOME/.bashrc)
Expand Down Expand Up @@ -209,15 +238,26 @@ TempEval Workshop at the ACL conference in Uppsala, Sweden on July 15, 2010 or J
In the meantime, it is May 2011 and HeidelTime is made publicly available and identifies
these temporal expressions: January 22, 2001 or twice a week.

##########################################
# 7. Additional HeidelTime documentation #
##########################################
HeidelTime's Google Code Project contains a lot of valuable information on how to use
HeidelTime or its components, as well as additional resources, an always up-to-date
code repository and issue tracker in case you spot a bug.
Visit the project at

http://code.google.com/p/heideltime/


#######################################################################
# 7. Reproducing HeidelTime's evaluation results on different corpora #
# 8. Reproducing HeidelTime's evaluation results on different corpora #
#######################################################################
To reproduce HeidelTime's evaluation results reported in in our paper "Multilingual
Cross-domain Temporal Tagging", follow the instructions on:
http://code.google.com/p/heideltime/wiki/ReproduceEvaluationResults

##############
# 8. License #
# 9. License #
##############
Copyright (c) 2012, Database Research Group, Institute of Computer Science, University of Heidelberg.
All rights reserved. This program and the accompanying materials
Expand All @@ -227,5 +267,4 @@ author: Jannik Strötgen
email: [email protected]

HeidelTime is a multilingual, cross-domain temporal tagger.
For details, see http://dbs.ifi.uni-heidelberg.de/heideltime and
http://code.google.com/p/heideltime/
For details, see http://dbs.ifi.uni-heidelberg.de/heideltime

0 comments on commit 1d2fdfa

Please sign in to comment.