Skip to content

Latest commit

 

History

History
33 lines (21 loc) · 1022 Bytes

README.md

File metadata and controls

33 lines (21 loc) · 1022 Bytes

DistributedERTool

#Overview

This application achieves the distributed interlinking of two datasets by incorporating the Spark Framework and the LIMES tool. It employs the concept of block purging so as to boost performance.

#Installation

1.Install the limes-core.jar to the local maven repository

mvn install:install-file -Dfile={path/to/limes-core.jar} -DgroupId=org.aksw.limes.core -DartifactId=limes-core -Dversion=1.1.0-SNAPSHOT -Dpackaging=jar

2.Build the spark application

mvn clean package

The SparkApplication-0.0.1-SNAPSHOT.jar will be created

#Execution

Execute the application against a spark cluster using the following command

./bin/spark-submit \
  --class spark.Controller \
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # other options
  <application-jar> \
  SparkApplication-0.0.1-SNAPSHOT.jar {/path/to/limes_configuration_file.xml} {/path/to/limes.dtd} {purging_enabled=(true,false)}