Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a JSONResultformatter and the possibility to use a python POS-tagger #60

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
Empty file added .Rhistory
Empty file.
18 changes: 18 additions & 0 deletions .classpath
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<?xml version="1.0" encoding="UTF-8"?>
<classpath>
<classpathentry kind="src" path="src/main/java"/>
<classpathentry kind="src" path="src/test/java"/>
<classpathentry kind="src" path="src/main/resources"/>
<classpathentry kind="src" path="src/test/resources"/>
<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER/org.eclipse.jdt.internal.debug.ui.launcher.StandardVMType/JavaSE-1.7">
<attributes>
<attribute name="maven.pomderived" value="true"/>
</attributes>
</classpathentry>
<classpathentry kind="con" path="org.eclipse.m2e.MAVEN2_CLASSPATH_CONTAINER">
<attributes>
<attribute name="maven.pomderived" value="true"/>
</attributes>
</classpathentry>
<classpathentry kind="output" path="class"/>
</classpath>
Empty file modified COPYING
100644 → 100755
Empty file.
28 changes: 27 additions & 1 deletion README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,32 @@

**HeidelTime contains automatically created resources for 200+ languages in addition to manually created ones for 13 languages. For further details, take a look at our [EMNLP 2015 paper](https://aclweb.org/anthology/D/D15/D15-1063.pdf).**

## About CRIM-Heideltime
CRIM-Heideltime extends Heideltime by offering two other part-of-speech-tagger wrappers and a JSON result formatter.

### Part-of-speech-tagger wrappers
* **Python part-of-speech-tagger wrapper** : the wrapper calls a Python script that returns the CAS tagged with the POS and the sentences. The path to the Python script has to be configured in the `config.props` file.
* **JSON part-of-speech-tagger wrapper** : the wrapper reads 2 JSON files, one containing the sentence annotations, the other the POS annotations. The paths to the JSON files have to be set as environment variables (_SENTENCE_ANNOTATION_FILE_PATH_ and _POS_ANNOTATION_FILE_PATH_). The JSONtaggerWrapper reads a configuration file (configured in `config.props`) containing the way to retrieve following elements within the JSON files :
* sentence_begin
* sentence_end
* token_begin
* token_end
* token_pos

The format of the file is on each line :
`element_to_retrieve\t[key|index] [key|index] ...`

Example :

```text
sentence_begin offsets 0 begin
sentence_end offsets 0 end
token_begin offsets 0 begin
token_end offsets 0 end
token_pos category
```


## About HeidelTime
**HeidelTime** is a multilingual, domain-sensitive temporal tagger developed at the [Database Systems Research Group](http://dbs.ifi.uni-heidelberg.de/) at [Heidelberg University](http://www.uni-heidelberg.de/index_e.html). It extracts temporal expressions from documents and normalizes them according to the TIMEX3 annotation standard. HeidelTime is available as [UIMA](http://uima.apache.org/) annotator and as standalone version.

Expand All @@ -13,7 +39,7 @@ Want to see what it can do before you delve in? Take a look at our **[online dem

![HeidelTime demo picture](https://drive.google.com/uc?export=download&id=0BwqFBQjz9NUicWEzaWlzT1J1SzQ)

## Latest downloads
## HeidelTime - Latest downloads

* Our latest as well as past releases are always available on the [Releases page](https://github.com/HeidelTime/heideltime/releases).
* Bleeding edge version is available via our Git repository.
Expand Down
10 changes: 9 additions & 1 deletion conf/config.props
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ considerTemponym = false
# Path to TreeTagger home directory
###################################
# Ensure there is no white space in path (try to escape white spaces)
treeTaggerHome = SET ME IN CONFIG.PROPS! (e.g., /home/jannik/treetagger)
treeTaggerHome = /misc/home/reco/reboutli/Dev/Java/Treetagger
# This one is only necessary if you want to process chinese documents.
chineseTokenizerPath = SET ME IN CONFIG.PROPS! (e.g., /home/jannik/treetagger/chinese-tokenizer)

Expand All @@ -45,7 +45,15 @@ config_path =
hunpos_path = SET ME IN CONFIG.PROPS! (e.g., /home/jannik/hunpos)
hunpos_model_name = SET ME IN CONFIG.PROPS! (e.g., model.hunpos.mte5.defnpout)

########################################
## paths to PythonTaggerscript:
########################################
python_script_path = SET ME IN CONFIG.PROPS! (e.g., /misc/home/reco/reboutli/Dev/Python/PythonPOSTaggerWrapper/JythonPOSTaggerWrapper.py)

########################################
## paths to JSONTaggerConfig:
########################################
json_config_path = SET ME IN CONFIG.PROPS! (e.g., /misc/home/reco/reboutli/Dev/Python/temporal_annotation/config_JSONReader.txt)

# DO NOT CHANGE THE FOLLOWING
################################
Expand Down
Empty file modified desc/annotator/AllLanguagesTokenizer.xml
100644 → 100755
Empty file.
Empty file modified desc/annotator/HeidelTime.xml
100644 → 100755
Empty file.
Empty file modified desc/annotator/HeidelTimeStyleMap.xml
100644 → 100755
Empty file.
Empty file modified desc/annotator/HunPosTaggerWrapper.xml
100644 → 100755
Empty file.
Empty file modified desc/annotator/IntervalTagger.xml
100644 → 100755
Empty file.
Empty file modified desc/annotator/JVnTextProWrapper.xml
100644 → 100755
Empty file.
Empty file modified desc/annotator/StanfordPOSTaggerWrapper.xml
100644 → 100755
Empty file.
Empty file modified desc/annotator/TreeTaggerWrapper.xml
100644 → 100755
Empty file.
Empty file modified desc/comsumer/ACETernWriter.xml
100644 → 100755
Empty file.
Empty file modified desc/comsumer/Eventi2014Writer.xml
100644 → 100755
Empty file.
Empty file modified desc/comsumer/Tempeval2Writer.xml
100644 → 100755
Empty file.
Empty file modified desc/comsumer/Tempeval3Writer.xml
100644 → 100755
Empty file.
Empty file modified desc/reader/ACETernReader.xml
100644 → 100755
Empty file.
Empty file modified desc/reader/Eventi2014Reader.xml
100644 → 100755
Empty file.
Empty file modified desc/reader/Tempeval2Reader.xml
100644 → 100755
Empty file.
Empty file modified desc/reader/Tempeval3Reader.xml
100644 → 100755
Empty file.
Empty file modified desc/type/HeidelTime_TypeSystem.xml
100644 → 100755
Empty file.
Empty file modified desc/type/HeidelTime_TypeSystemStyleMap.xml
100644 → 100755
Empty file.
Empty file modified doc/howToWriteRules.txt
100644 → 100755
Empty file.
Empty file modified doc/readme.txt
100644 → 100755
Empty file.
Empty file modified lib/uima-core.jar
100644 → 100755
Empty file.
Empty file modified metadata/adaptDKProDescriptors.sh
100644 → 100755
Empty file.
Empty file modified metadata/install.xml
100644 → 100755
Empty file.
Empty file modified metadata/jvntextpro-pom.xml
100644 → 100755
Empty file.
Empty file modified metadata/setenv
100644 → 100755
Empty file.
Empty file modified metadata/setenv.bat
100644 → 100755
Empty file.
Empty file modified metadata/standalone/pom.xml
100644 → 100755
Empty file.
Empty file modified metadata/webui/pom.xml
100644 → 100755
Empty file.
114 changes: 49 additions & 65 deletions pom.xml
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>com.github.heideltime</groupId>
<artifactId>heideltime</artifactId>
<version>2.2.1</version>
<groupId>ca.crim.nlp</groupId>
<artifactId>crim-heideltime</artifactId>
<version>3.0.8-SNAPSHOT</version>

<name>HeidelTime</name>
<description>HeidelTime is a multilingual cross-domain temporal tagger that extracts temporal expressions from documents and normalizes them according to the TIMEX3 annotation standard.</description>
<url>https://github.com/HeidelTime/heideltime/</url>
<description> This version of HeidelTime extends the well-known multilingual cross-domain temporal tagger (com.github.hiedltime) that extracts temporal expressions from documents and normalizes them according to the TIMEX3 annotation standard.</description>
<url>https://github.com/reboutli-crim/heideltime</url>

<licenses>
<license>
Expand All @@ -18,7 +18,7 @@
</licenses>

<issueManagement>
<url>https://github.com/HeidelTime/heideltime/issues</url>
<url>https://github.com/reboutli-crim/heideltime/issues</url>
<system>GitHub</system>
</issueManagement>

Expand All @@ -39,16 +39,22 @@
<email>[email protected]</email>
<url>https://github.com/jzell</url>
</developer>
<developer>
<id>reboutli-crim</id>
<name>Lise Rebout</name>
<email>[email protected]</email>
<url>https://github.com/reboutli-criml</url>
</developer>
</developers>

<scm>
<url>https://github.com/HeidelTime/heideltime</url>
<connection>scm:git:[email protected]:HeidelTime/heideltime.git</connection>
<developerConnection>scm:git:[email protected]:HeidelTime/heideltime.git</developerConnection>
<url>https://github.com/reboutli-crim/heideltime</url>
<connection>scm:git:[email protected]:reboutli-crim/heideltime.git</connection>
<developerConnection>scm:git:[email protected]:reboutli-crim/heideltime.git</developerConnection>
</scm>

<build>
<sourceDirectory>src</sourceDirectory>
<sourceDirectory>src/main/java</sourceDirectory>
<outputDirectory>${basedir}/class</outputDirectory>
<resources>
<resource>
Expand All @@ -58,13 +64,12 @@
</includes>
</resource>
<resource>
<directory>resources/</directory>
<directory>src/main/resources/</directory>
<includes>
<include>**/*.txt</include>
</includes>
</resource>
</resources>
<finalName>de.unihd.dbs.heideltime.standalone</finalName>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
Expand Down Expand Up @@ -107,22 +112,6 @@
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>2.2.1</version>
<configuration>
<excludeResources>true</excludeResources>
</configuration>
<executions>
<execution>
<id>attach-sources</id>
<goals>
<goal>jar-no-fork</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
Expand All @@ -139,52 +128,15 @@
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-gpg-plugin</artifactId>
<version>1.5</version>
<executions>
<execution>
<id>sign-artifacts</id>
<phase>verify</phase>
<goals>
<goal>sign</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.sonatype.plugins</groupId>
<artifactId>nexus-staging-maven-plugin</artifactId>
<version>1.6.3</version>
<extensions>true</extensions>
<configuration>
<serverId>ossrh</serverId>
<nexusUrl>https://oss.sonatype.org/</nexusUrl>
<autoReleaseAfterClose>false</autoReleaseAfterClose>
</configuration>
</plugin>
</plugins>
</build>

<distributionManagement>
<snapshotRepository>
<id>ossrh</id>
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
</snapshotRepository>
<repository>
<id>ossrh</id>
<url>https://oss.sonatype.org/service/local/staging/deploy/maven2</url>
</repository>
</distributionManagement>

<dependencies>
<!-- for practically every component -->
<dependency>
<groupId>org.apache.uima</groupId>
<artifactId>uimaj-core</artifactId>
<version>2.8.1</version>
<scope>provided</scope>
</dependency>
<!-- for the StanfordPOSTaggerWrapper -->
<dependency>
Expand All @@ -193,6 +145,25 @@
<version>3.3.1</version>
<scope>provided</scope>
</dependency>
<!-- for the JsonPOSTagger -->
<dependency>
<groupId>com.googlecode.json-simple</groupId>
<artifactId>json-simple</artifactId>
<version>1.1.1</version>
</dependency>
<!-- for the JsonResultWritter -->
<dependency>
<groupId>org.apache.uima</groupId>
<artifactId>uimaj-json</artifactId>
<version>2.8.1</version>
</dependency>
<!-- For the Python POSTAger -->
<dependency>
<groupId>org.python</groupId>
<artifactId>jython-standalone</artifactId>
<version>2.7.1b2</version>
<scope>provided</scope>
</dependency>
<!-- these are for JVnTextPro -->
<dependency>
<groupId>args4j</groupId>
Expand All @@ -206,5 +177,18 @@
<version>0.1</version>
<scope>provided</scope>
</dependency>
<!-- For testing -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.github.stefanbirkner</groupId>
<artifactId>system-rules</artifactId>
<version>1.16.0</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ public enum CLISwitch {
LOCALE ("Locale", "-locale", null),
POSTAGGER ("Part of Speech tagger", "-pos", POSTagger.TREETAGGER),
INTERVALS ("Interval Tagger", "-it"),
POSFILE ("Path to the JSON-file describing the POS", "-pf", null),
SENTENCEFILE("Path to the JSON-file describing the sentences", "-sf", null),
HELP ("This screen", "-h"),
;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,10 @@ public abstract class Config {
public static final String HUNPOS_PATH = "hunpos_path";
public static final String HUNPOS_MODEL_PATH = "hunpos_model_name";

public static final String PYTHON_SCRIPT_PATH = "python_script_path";

public static final String JSON_CONFIG_PATH = "json_config_path";

public static final String TYPESYSTEMHOME = "typeSystemHome";
public static final String TYPESYSTEMHOME_DKPRO = "typeSystemHome_DKPro";

Expand Down
File renamed without changes.
Loading