A Java library for extracting structured data from unstructured data
This library was inspired by the logstash inteceptor or filter available here
http://logstash.net/docs/1.4.0/filters/grok
This grok library comes with pre-defined patterns
https://github.com/aicer/grok/tree/master/src/main/resources/grok_built_in_patterns
However, you can also create your own custom named patterns.
The syntax for the patterns are as follows
%{PATTERN_NAME:NAMED_GROUP_IN_RESULT}
For example, the following pattern
%{EMAIL:username} %{USERNAME:password} %{INT:yearOfBirth}
will extract an email address, password and year of birth from the following string
55BB778 - [email protected] secret123 4439 Valid Data Stream
The PATTERN_NAME has to be defined in the dictionary and the group names, username, password and yearOfBirth will be used to retrieve the values from the extraction results.
<dependency>
<groupId>org.aicer.grok</groupId>
<artifactId>grok</artifactId>
<version>0.9.0</version>
</dependency>
Patterns can be loaded in 4 ways by invoking the following methods on the dictionary object.
This loads all the built in dictionaries from the class path
final GrokDictionary dictionary = new GrokDictionary();
// Load the built-in dictionaries
dictionary.addBuiltInDictionaries();
// Add custom pattern
dictionary.addDictionary(new File(patternDirectoryOrFilePath));
// Resolve all expressions loaded
dictionary.bind();
Here custom patterns can be loaded into the dictionary by passing in a File object representing the directory where the patterns are stored
Here custom patterns can be loaded into the dictionary by passing in an inpustream containing the named expressions
Here a custom pattern can be added by passing a reader contain the named pattern
final GrokDictionary dictionary = new GrokDictionary();
// Load the built-in dictionaries
dictionary.addBuiltInDictionaries();
// Add custom pattern
dictionary.addDictionary(new StringReader("DOMAINTLD [a-zA-Z]+"));
dictionary.addDictionary(new StringReader("EMAIL %{NOTSPACE}@%{WORD}\.%{DOMAINTLD}"));
// Resolve all expressions loaded
dictionary.bind();
public final class GrokStage {
private static final void displayResults(final Map<String, String> results) {
if (results != null) {
for(Map.Entry<String, String> entry : results.entrySet()) {
System.out.println(entry.getKey() + "=" + entry.getValue());
}
}
}
public static void main(String[] args) {
final String rawDataLine1 = "1234567 - [email protected] cc55ZZ35 1789 Hello Grok";
final String rawDataLine2 = "98AA541 - [email protected] mmddgg22 8800 Hello Grok";
final String rawDataLine3 = "55BB778 - [email protected] secret123 4439 Valid Data Stream";
final String expression = "%{EMAIL:username} %{USERNAME:password} %{INT:yearOfBirth}";
final GrokDictionary dictionary = new GrokDictionary();
// Load the built-in dictionaries
dictionary.addBuiltInDictionaries();
// Resolve all expressions loaded
dictionary.bind();
// Take a look at how many expressions have been loaded
System.out.println("Dictionary Size: " + dictionary.getDictionarySize());
Grok compiledPattern = dictionary.compileExpression(expression);
displayResults(compiledPattern.extractNamedGroups(rawDataLine1));
displayResults(compiledPattern.extractNamedGroups(rawDataLine2));
displayResults(compiledPattern.extractNamedGroups(rawDataLine3));
}
}
Which gives the folllowing output
Dictionary Size: 91
[email protected]
password=cc55ZZ35
yearOfBirth=1789
[email protected]
password=mmddgg22
yearOfBirth=8800
[email protected]
password=secret123
yearOfBirth=4439