Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update project to clean up code/write-ups and include more configurable examples #1

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 17 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,26 @@
# Lucille Example Project (Maven)
# Lucille Example Project

This project is an example of how a developer can leverage [Lucille]([url](https://github.com/kmwtechnology/lucille)), the opensource search ETL solution, for their own use case.
You can create your own stages, connectors, etc. by adding them to the src code and using them in a configuration file.

You can find the current release of [Lucille on maven central]([url](https://mvnrepository.com/artifact/com.kmwllc/lucille-core)).

# Requirements

## Requirements
- Java 11

# Getting Started
## Maven
You can find the current release of [Lucille on maven central]([url](https://mvnrepository.com/artifact/com.kmwllc/lucille-core)).

- Include `lucille-core` and `lucille-bom` as a maven dependency.
- Set up the run configurations. You can find `example.conf`.
### Getting Started
- Compile the code to create the necessary jar files, `mvn clean install` in the top directory.
- Run `./lucille.sh` which runs a java process
- Run `./lucille.sh` in the top directory which runs a java process that takes the configuration in `example.conf` to extract, transform, and index the data.
- The example creates dummy docs, transforms the data a little, and creates docs to be indexed into OpenSearch.
The default here does not actually send the docs, but if you want to actually see the indexed data, here are some instructions for setting up OpenSearch locally.
- [OpenSearch Installation Docs](https://opensearch.org/docs/latest/install-and-configure/install-opensearch/index/)
- We would reccommend using docker to install OpenSearch if you are already familiar with docker.
- Once installed, make sure the OpenSearch section `example.conf` is set up correctly for your configuration of OpenSearch (localhost port, user/password) AND set indexer.sendEnabled to `true`.
-

## Gradle
TODO


You can create your own stages, connectors, etc. by adding them to the src code and using them in a configuration file.

15 changes: 8 additions & 7 deletions example.conf → conf/example.conf
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@
# and indexed into an OpenSearch index

connectors: [
{
class: "com.lucille.example.connector.RandomDocConnector"
name: "test_connector"
numDocs: 10
fieldNames: ["randTime"]
pipeline: "simple_pipeline"
}
{
class: "com.lucille.example.connector.RandomDocConnector"
name: "test_connector"
numDocs: 10
fieldNames: ["randTime"]
pipeline: "simple_pipeline"
}
]

pipelines: [
Expand All @@ -34,6 +34,7 @@ pipelines: [

indexer {
type: "OpenSearch"
sendEnabled: false # enable to actually index docs to OpenSearch
}

# OpenSearch
Expand Down
2 changes: 1 addition & 1 deletion lucille.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
#!/bin/bash
# run this script from top level lucille-example-mvn directory via ./lucille.sh
java -Dconfig.file=example.conf -cp "target/classes/lib/*:target/*" com.kmwllc.lucille.core.Runner -local
java -Dlog4j.configurationFile=log4j2.xml -Dconfig.file=conf/example.conf -cp "target/lib/*:target/*" com.kmwllc.lucille.core.Runner -local
kiratraynor marked this conversation as resolved.
Show resolved Hide resolved
131 changes: 75 additions & 56 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,64 +2,83 @@
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<modelVersion>4.0.0</modelVersion>

<groupId>com.lucille.example</groupId>
<artifactId>lucille-example</artifactId>
<version>1.0-SNAPSHOT</version>
<groupId>com.lucille.example</groupId>
<artifactId>lucille-example</artifactId>
<version>1.0-SNAPSHOT</version>

<properties>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
</properties>
<properties>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
</properties>

<dependencies>
<dependency>
<groupId>com.kmwllc</groupId>
<artifactId>lucille-core</artifactId>
<version>0.2.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.kmwllc</groupId>
<artifactId>lucille-bom</artifactId>
<version>0.2.1</version>
<type>pom</type>
</dependency>
<dependency>
<groupId>com.typesafe</groupId>
<artifactId>config</artifactId>
<version>1.4.1</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.13.1</version>
<scope>test</scope>
</dependency>
</dependencies>
<dependencies>
<dependency>
<groupId>com.kmwllc</groupId>
<artifactId>lucille-core</artifactId>
<version>0.2.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.kmwllc</groupId>
<artifactId>lucille-bom</artifactId>
<version>0.2.1</version>
<type>pom</type>
</dependency>
<dependency>
<groupId>com.typesafe</groupId>
<artifactId>config</artifactId>
<version>1.4.1</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.13.1</version>
<scope>test</scope>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<id>copy-dependencies</id>
<phase>prepare-package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/classes/lib</outputDirectory>
<overWriteReleases>false</overWriteReleases>
<overWriteSnapshots>false</overWriteSnapshots>
<overWriteIfNewer>true</overWriteIfNewer>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<id>copy-dependencies</id>
<phase>prepare-package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/lib</outputDirectory>
<overWriteReleases>false</overWriteReleases>
<overWriteSnapshots>false</overWriteSnapshots>
<overWriteIfNewer>true</overWriteIfNewer>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<executions>
<execution>
<id>package-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
<configuration>
<finalName>lucille-example-${project.version}</finalName>
<descriptors>
<descriptor>src/assembly/assembly.xml</descriptor>
</descriptors>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
32 changes: 32 additions & 0 deletions src/assembly/assembly.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
<assembly
xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2 http://maven.apache.org/xsd/assembly-1.1.2.xsd">

<id>bin</id>
<formats>
<format>tar.gz</format>
</formats>
<includeBaseDirectory>false</includeBaseDirectory>

<fileSets>
<fileSet>
<directory>${project.basedir}/src/main/script/bin</directory>
<outputDirectory>/bin</outputDirectory>
<useDefaultExcludes>true</useDefaultExcludes>
<fileMode>0755</fileMode>
</fileSet>
<fileSet>
<directory>${project.basedir}/src/main/conf</directory>
<outputDirectory>conf</outputDirectory>
</fileSet>
<fileSet>
<directory>${project.basedir}/target</directory>
<includes>
<include>*.jar</include>
</includes>
<outputDirectory>lib</outputDirectory>
</fileSet>
</fileSets>

</assembly>
30 changes: 11 additions & 19 deletions src/main/java/com/lucille/example/connector/RandomDocConnector.java
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
import com.kmwllc.lucille.core.*;
import com.typesafe.config.Config;
import java.util.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
* Connector implementation that creates x number documents with randomly generated field values.
Expand All @@ -15,32 +17,32 @@
*/
public class RandomDocConnector extends AbstractConnector {

private static final Logger log = LoggerFactory.getLogger(RandomDocConnector.class);

private int numDocs;
private List<String> fieldNames;

private Random rand = new Random();
private int MAX = Integer.MAX_VALUE;


public RandomDocConnector(Config config) {
public RandomDocConnector(Config config) throws ConnectorException {
super(config);
if ( config.getInt("numDocs") > 1000000) {
throw new ConnectorException("The number of documents (numDocs) cannot be grater than 1000000.");
}
numDocs = config.getInt("numDocs");
fieldNames = config.getStringList("fieldNames");
}

@Override
public void preExecute(String runId) {
// calculate maximum bound for random number generator
this.MAX = this.numDocs * 1000;
}

@Override
public void execute(Publisher publisher) throws ConnectorException {
int randBound = this.numDocs * 1000;

log.info("Generating {} documents with random values.", this.numDocs);
for (int i = 0; i < this.numDocs; i++) {
Document doc = Document.create(Integer.toString(i));
for (String field : this.fieldNames) {
doc.setField(field, this.rand.nextInt(this.MAX));
doc.setField(field, this.rand.nextInt(randBound));
}
try {
publisher.publish(doc);
Expand All @@ -49,14 +51,4 @@ public void execute(Publisher publisher) throws ConnectorException {
}
}
}

@Override
public void postExecute(String runId) throws ConnectorException {
super.postExecute(runId);
}

@Override
public void close() throws ConnectorException {
super.close();
}
}
32 changes: 32 additions & 0 deletions src/main/resources/log4j2.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="INFO" monitorInterval="30">
<Appenders>
<!-- Using Json formatting for console logging only -->
<Console name="Console" target="SYSTEM_OUT">
<JsonTemplateLayout compact="true" eventTemplateUri="classpath:JsonLayout.json"/>
</Console>
<RollingFile name="RollingFile" fileName="./log/com.kmwllc.lucille.log" filePattern="./log/com.kmwllc.lucille-%i.log">
<PatternLayout pattern="%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n" />
<Policies>
<SizeBasedTriggeringPolicy size="10 MB" />
</Policies>
<DefaultRolloverStrategy max="20" />
</RollingFile>
<RollingFile name="heartbeat" fileName="./log/heartbeat.log" filePattern="./log/heartbeat-%i.log" immediateFlush="true">
<PatternLayout pattern="%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n" />
<Policies>
<SizeBasedTriggeringPolicy size="5 KB" />
</Policies>
<DefaultRolloverStrategy max="4" />
</RollingFile>
</Appenders>
<Loggers>
<Root level="INFO">
<AppenderRef ref="Console" />
<AppenderRef ref="RollingFile" />
</Root>
<Logger name="com.kmwllc.lucille.core.Heartbeat" level="INFO" additivity="false">
<AppenderRef ref="heartbeat" />
</Logger>
</Loggers>
</Configuration>
17 changes: 17 additions & 0 deletions src/test/java/com/lucille/example/RunLucille.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
package com.lucille.example;

import com.kmwllc.lucille.core.Runner;

/**
* An alternative to using the lucille.sh. Use this test to be able to debug parts of your run configuration.
*/
public class RunLucille {
public static void main(String[] args) throws Exception {
// if no config.file is given to the run configuration, using example.conf
String configFile = "conf/example.conf";
System.getProperty("config.file", configFile);

Runner.main(args);
}

}
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
package com.lucille.example.connector;

import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertTrue;

import com.kmwllc.lucille.core.Connector;
import com.kmwllc.lucille.core.Document;
import com.kmwllc.lucille.core.Publisher;
Expand All @@ -25,14 +27,8 @@ public void testExecute() throws Exception {
List<Document> docs = messenger.getDocsSentForProcessing();
// ensure doc count is correct
assertEquals(50, docs.size());
// ensure all expected fields are there ["randTime", "randNum"]
try {
docs.forEach(document -> document.validateFieldNames("randTime", "randNum"));
assert true;
} catch (IllegalArgumentException e) {
assert false;
throw new IllegalArgumentException(e);
}
// ensure all expected fields are in the documents: ["randTime", "randNum"]
docs.forEach(document -> assertTrue(document.has("randTime") && document.has("randNum")));
}

}
2 changes: 1 addition & 1 deletion src/test/java/com/lucille/example/stage/AddUnitsTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

public class AddUnitsTest {

private StageFactory factory = StageFactory.of(AddUnitsStage.class);
private final StageFactory factory = StageFactory.of(AddUnitsStage.class);

@Test
public void testAddUnitBefore() throws StageException {
Expand Down
Loading