GitHub - reboutli-crim/heideltime: A multilingual, cross-domain temporal tagger developed at the Database Systems Research Group at Heidelberg University.

HeidelTime can now also be used for English temponym tagging. For details, see our TempWeb'16 paper.

HeidelTime contains automatically created resources for 200+ languages in addition to manually created ones for 13 languages. For further details, take a look at our EMNLP 2015 paper.

About CRIM-Heideltime

CRIM-Heideltime extends Heideltime by offering two other part-of-speech-tagger wrappers and a JSON result formatter.

Part-of-speech-tagger wrappers

Python part-of-speech-tagger wrapper : the wrapper calls a Python script that returns the CAS tagged with the POS and the sentences. The path to the Python script has to be configured in the config.props file.
JSON part-of-speech-tagger wrapper : the wrapper reads 2 JSON files, one containing the sentence annotations, the other the POS annotations. The paths to the JSON files have to be set as environment variables (SENTENCE_ANNOTATION_FILE_PATH and POS_ANNOTATION_FILE_PATH). The JSONtaggerWrapper reads a configuration file (configured in config.props) containing the way to retrieve following elements within the JSON files :
- sentence_begin
- sentence_end
- token_begin
- token_end
- token_pos

The format of the file is on each line :
element_to_retrieve\t[key|index] [key|index] ...

Example :

sentence_begin	offsets 0 begin
sentence_end	offsets 0 end
token_begin	offsets 0 begin  
token_end	offsets 0 end   
token_pos	category

About HeidelTime

HeidelTime is a multilingual, domain-sensitive temporal tagger developed at the Database Systems Research Group at Heidelberg University. It extracts temporal expressions from documents and normalizes them according to the TIMEX3 annotation standard. HeidelTime is available as UIMA annotator and as standalone version.

HeidelTime currently contains hand-crafted resources for 13 languages: English, German, Dutch, Vietnamese, Arabic, Spanish, Italian, French, Chinese, Russian, Croatian, Estonian and Portuguese. In addition, starting with version 2.0, HeidelTime contains automatically created resources for more than 200 languages. Although these resources are of lower quality than the manually created ones, temporal tagging of many of these languages has never been addressed before. Thus, HeidelTime can be used as a baseline for temporal tagging of all these languages or as a starting point for developing temporal tagging capabilities for them.

HeidelTime distinguishes between news-style documents and narrative-style documents (e.g., Wikipedia articles) in all languages. In addition, English colloquial (e.g., Tweets and SMS) and scientific articles (e.g., clinical trails) are supported.

Want to see what it can do before you delve in? Take a look at our online demo.

HeidelTime - Latest downloads

Our latest as well as past releases are always available on the Releases page.
Bleeding edge version is available via our Git repository.
Our temporal annotated corpora and supplementary evaluation scripts can be found here.
If you want to receive notifications on updates of HeidelTime, please fill out this form.
You can also follow us on Twitter @HeidelTime.

Maven

A minimal set of dependencies is satisfied by these entries for your pom.xml:

		<dependency>
			<groupId>org.apache.uima</groupId>
			<artifactId>uimaj-core</artifactId>
			<version>2.8.1</version>
		</dependency>
		<dependency>
			<groupId>com.github.heideltime</groupId>
			<artifactId>heideltime</artifactId>
			<version>2.2</version>
		</dependency>

For some additional features, you will need to provide additional dependencies. See our Maven wiki page.

Publications

If you use HeidelTime, please cite the appropriate paper (in general, this would be the journal paper [4]; if you use HeidelTime with automatically created resources, please cite paper [10]; if you use HeidelTime for temponym tagging, please cite paper [11]):

Strötgen, Gertz: HeidelTime: High Qualitiy Rule-based Extraction and Normalization of Temporal Expressions. SemEval'10. pdf bibtex
Strötgen, Gertz: Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards. LREC'12. pdf bibtex
Strötgen et al.: HeidelTime: Tuning English and Developing Spanish Resources for TempEval-3. SemEval'13. pdf bibtex
Strötgen, Gertz: Multilingual and Cross-domain Temporal Tagging. Language Resources and Evaluation, 2013. pdf bibtex
Strötgen et al.: Time for More Languages: Temporal Tagging of Arabic, Italian, Spanish, and Vietnamese. TALIP, 2014. pdf bibtex
Li et al.: Chinese Temporal Tagging with HeidelTime. EACL'14. pdf bibtex
Strötgen et al.: Extending HeidelTime for Temporal Expressions Referring to Historic Dates. LREC'14. pdf bibtex
Manfredi et al.: HeidelTime at EVENTI: Tuning Italian Resources and Addressing TimeML's Empty Tags. EVALITA'14. pdf bibtex
Strötgen: Domain-sensitive Temporal Tagging for Event-centric Information Retrieval. PhD Thesis. pdf bibtex
Strötgen, Gertz: A Baseline Temporal Tagger for All Languages. EMNLP'15. pdf bibtex
Kuzey, Strötgen, Setty, Weikum: Temponym Tagging: Temporal Scopes for Textual Phrases. TempWeb'16. pdf bibtex

Language Resources

We want to thank the following researchers for their efforts to develop HeidelTime resources:

Dutch resources: Matje van de Camp, Tilburg University
French resources: Véronique Moriceau, LIMSI - CNRS
Russian resources: Elena Klyachko
Croatian resources: Luka Skukan, University of Zagreb
Portuguese resources: Zunsik Lim

Please feel free to use our automatically created resources as starting point, if you plan to manually address a language.

Tell me more!

HeidelTime was developed in Java with extensibility in mind -- especially in terms of language-specific resources, as well as in terms of programmatic functionality.

Get your hands dirty!

You'd like to reproduce HeidelTime's evaluation results described in our papers on several corpora? Download the heideltime-kit or clone our repository and check out our tutorial on reproducing evaluation results. This will also explain how to integrate the HeidelTime annotator into a UIMA pipeline.
You'd like to participate in the development of HeidelTime; maybe create an addon or improve functionality? Clone our repository and see how to set up Eclipse to develop HeidelTime. Then have a look at HeidelTime's architectural concepts and have a go at it!
You'd like to share some changes you've made, resources for a new language, or you think that HeidelTime could be improved in a specific way? Open up a pull request or an issue and let us know, we're eager to read your thoughts!

Name		Name	Last commit message	Last commit date
Latest commit History 771 Commits
conf		conf
desc		desc
doc		doc
lib		lib
metadata		metadata
src		src
.Rhistory		.Rhistory
.classpath		.classpath
COPYING		COPYING
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About CRIM-Heideltime

Part-of-speech-tagger wrappers

About HeidelTime

HeidelTime - Latest downloads

Maven

Publications

Language Resources

Tell me more!

Get your hands dirty!

About

Releases 9

Packages

Languages

License

reboutli-crim/heideltime

Folders and files

Latest commit

History

Repository files navigation

About CRIM-Heideltime

Part-of-speech-tagger wrappers

About HeidelTime

HeidelTime - Latest downloads

Maven

Publications

Language Resources

Tell me more!

Get your hands dirty!

About

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Languages

Packages