Skip to content

Latest commit

 

History

History
66 lines (43 loc) · 1.37 KB

README.md

File metadata and controls

66 lines (43 loc) · 1.37 KB

orac-sdk

SDK for ORAC API: a collection of ETL and algorithms for ORAC

The main library is written in Scala though submodules are written also in python.

The submodules can be downloaded with the following command:

git submodule update --init --recursive

To update the submodules:

git submodule update --recursive --remote

To create a self contained jar with the spark code type:

sbt assembly

commands

DISCLAIMER: the readme is a work in progress and is not complete.

SupermarketETL

description

ETL program which convert a specific XML format into a format suitable for the item to item encoder NN It calculate items similarity using LSH algorithm. The program has a command line help.

command line

generation of a fat jar

export JAVA_OPTS="-Xms256m -Xmx4g"
sbt assembly

Job sumbission

Sample submission:

ASSEMBLY_JAR=orac-sdk-assembly-dad9212b5f3ca9daefb9d7bc3c2384c6d963a304.jar
spark-submit --driver-memory 8g --class io.elegans.oracsdk.commands.SupermarketETL ${ASSEMBLY_JAR} --input data.xml --shuffle --output ETL_FOLDER --simThreshold 0.5 --sliding 2 --genMahoutActions --basketToBasket --rankIdToPopularItems 

submodules

To download the submodules:

git submodule update --init --recursive

To update the submodules

git submodule update --recursive --remote