Skip to content
/ grok Public

A Java library for extracting structured data from unstructured data

License

Notifications You must be signed in to change notification settings

aicer/grok

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Grok Library

A Java library for extracting structured data from unstructured data

This library was inspired by the logstash inteceptor or filter available here

http://logstash.net/docs/1.4.0/filters/grok

This grok library comes with pre-defined patterns

https://github.com/aicer/grok/tree/master/src/main/resources/grok_built_in_patterns

However, you can also create your own custom named patterns.

SYNTAX

The syntax for the patterns are as follows

%{PATTERN_NAME:NAMED_GROUP_IN_RESULT}

For example, the following pattern

%{EMAIL:username} %{USERNAME:password} %{INT:yearOfBirth}

will extract an email address, password and year of birth from the following string

55BB778 - [email protected] secret123 4439 Valid Data Stream

The PATTERN_NAME has to be defined in the dictionary and the group names, username, password and yearOfBirth will be used to retrieve the values from the extraction results.

How to Include It as a Maven Dependency

<dependency>
    <groupId>org.aicer.grok</groupId>
    <artifactId>grok</artifactId>
    <version>0.9.0</version>
</dependency>

How to Use the Library

Patterns can be loaded in 4 ways by invoking the following methods on the dictionary object.

GrokDictionary.addBuiltInDictionaries()

This loads all the built in dictionaries from the class path

GrokDictionary.addDictionary(File)

final GrokDictionary dictionary = new GrokDictionary();

// Load the built-in dictionaries
dictionary.addBuiltInDictionaries();

// Add custom pattern
dictionary.addDictionary(new File(patternDirectoryOrFilePath));

// Resolve all expressions loaded
dictionary.bind();

Here custom patterns can be loaded into the dictionary by passing in a File object representing the directory where the patterns are stored

GrokDictionary.addDictionary(InputStream)

Here custom patterns can be loaded into the dictionary by passing in an inpustream containing the named expressions

GrokDictionary.addDictionary(Reader)

Here a custom pattern can be added by passing a reader contain the named pattern

final GrokDictionary dictionary = new GrokDictionary();

// Load the built-in dictionaries
dictionary.addBuiltInDictionaries();

// Add custom pattern
dictionary.addDictionary(new StringReader("DOMAINTLD [a-zA-Z]+"));
dictionary.addDictionary(new StringReader("EMAIL %{NOTSPACE}@%{WORD}\.%{DOMAINTLD}"));

// Resolve all expressions loaded
dictionary.bind();

Example of How to Use The Library

public final class GrokStage {

  private static final void displayResults(final Map<String, String> results) {
    if (results != null) {
      for(Map.Entry<String, String> entry : results.entrySet()) {
        System.out.println(entry.getKey() + "=" + entry.getValue());
      }
    }
  }

  public static void main(String[] args) {

    final String rawDataLine1 = "1234567 - [email protected] cc55ZZ35 1789 Hello Grok";
    final String rawDataLine2 = "98AA541 - [email protected] mmddgg22 8800 Hello Grok";
    final String rawDataLine3 = "55BB778 - [email protected] secret123 4439 Valid Data Stream";

    final String expression = "%{EMAIL:username} %{USERNAME:password} %{INT:yearOfBirth}";

    final GrokDictionary dictionary = new GrokDictionary();

    // Load the built-in dictionaries
    dictionary.addBuiltInDictionaries();

    // Resolve all expressions loaded
    dictionary.bind();

    // Take a look at how many expressions have been loaded
    System.out.println("Dictionary Size: " + dictionary.getDictionarySize());

    Grok compiledPattern = dictionary.compileExpression(expression);

    displayResults(compiledPattern.extractNamedGroups(rawDataLine1));
    displayResults(compiledPattern.extractNamedGroups(rawDataLine2));
    displayResults(compiledPattern.extractNamedGroups(rawDataLine3));
  }
}

Which gives the folllowing output

Dictionary Size: 91

[email protected]
password=cc55ZZ35
yearOfBirth=1789

[email protected]
password=mmddgg22
yearOfBirth=8800

[email protected]
password=secret123
yearOfBirth=4439

About

A Java library for extracting structured data from unstructured data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages