Skip to content

Latest commit

 

History

History
220 lines (164 loc) · 7.27 KB

README.md

File metadata and controls

220 lines (164 loc) · 7.27 KB

Build Status Codecov coverage Gitter

Java VTL: Java implementation of VTL

The Java VTL project is an open source java implementation of the VTL 1.1 draft specification. It follows the JSR-223 Java Scripting API and exposes a simple connector interface one can implement in order to integrate with any data stores.

Visit the interactive reference manual for more information.

Modules

The project is divided in modules;

  • java-vtl-parent
    • java-vtl-parser, contains the lexer and parser for VTL.
    • java-vtl-model, VTL data model.
    • java-vtl-script, JSR-223 (ScriptEngine) implementation.
    • java-vtl-connector, connector API.
    • java-vtl-tools, various tools.

Usage

Add a dependency to the maven project

<dependency>
    <groupId>no.ssb.vtl</groupId>
    <artifactId>java-vtl-script</artifactId>
    <version>0.1.13-SNAPSHOT</version>
</dependency>

Evaluate VTL expressions

ScriptEngine engine = new VTLScriptEngine(connector);

Bindings bindings = engine.getBindings(ScriptContext.ENGINE_SCOPE);
engine.eval("ds1 := get(\"foo\")" +
            "ds2 := get(\"bar\")" +
            "ds3 := [ds1, ds2] {" +
            "   filter ds1.id = \"string\"," +
            "   total := ds1.measure + ds2.measure" +
            "}");

System.out.println(bindings.get("ds3"))

Connect to external systems

VTL Java uses the no.ssb.vtl.connector.Connector interface to access and export data from and to external systems.

The Connector interface defines three methods:

public interface Connector {

    boolean canHandle(String identifier);

    Dataset getDataset(String identifier) throws ConnectorException;

    Dataset putDataset(String identifier, Dataset dataset) throws ConnectorException;

}

The method canHandle(String identifier) is used by the engine to find which connector is able to provide a Dataset for a given identifier.

The method getDataset(String identifier) is then called to get the dataset. Example implementations can be found in the java-vtl-ssb-api-connector module but a very crude implementation could be as such:

class StaticDataset implements Dataset {

    private final DataStructure structure = DataStructure.builder()
            .put("id", Role.IDENTIFIER, String.class)
            .put("period", Role.IDENTIFIER, Instant.class)
            .put("measure", Role.MEASURE, Long.class)
            .put("attribute", Role.ATTRIBUTE, String.class)
            .build();

    @Override
    public Stream<DataPoint> getData() {

        List<Map<String, Object>> data = new ArrayList<>();
        HashMap<String, Object> row = new HashMap<>();
        Instant period = Instant.now();
        for (int i = 0; i < 100; i++) {
            row.put("id", "id #" + i);
            row.put("period", period);
            row.put("measure", Long.valueOf(i));
            row.put("attribute", "attribute #" + i);
            data.add(row);
        }

        return data.stream().map(structure::wrap);
    }

    @Override
    public Optional<Map<String, Integer>> getDistinctValuesCount() {
        return Optional.empty();
    }

    @Override
    public Optional<Long> getSize() {
        return Optional.of(100L);
    }

    @Override
    public DataStructure getDataStructure() {
        return structure;
    }
}

Implementation roadmap

This is an overview of the implementation progress.

Group Operators Progress Comment
General purpose round parenthesis done
General purpose := (assignment) done
General purpose membership done
General purpose get usable The keep, filter and aggregate options are not implemented.
General purpose put usable Defined in the grammar but not implemented
Join expression []{} done
Join clause filter done
Join clause keep done
Join clause drop done
Join clause fold done
Join clause unfold done
Join clause rename done
Join clause := (assignment) done
Join clause . (membership) done
Clauses rename done
Clauses filter done
Clauses keep done
Clauses calc todo
Clauses attrcalc todo
Clauses aggregate todo
Conditional if-then-else todo
Conditional nvl done
Validation Comparisons (>,<,>=,<=,=,<>) done
Validation in,not in, between todo
Validation isnull done Implemented syntax are isnull(value), value is null and value is not null
Validation exist_in, not_exist_in todo
Validation exist_in_all, not_exist_in_all todo
Validation check usable The boolean dataset must be built manually (no lifting).
Validation match_characters todo
Validation match_values todo
Statistical min, max todo
Statistical hierarchy usable The inline definition is not supported. A dataset that has a correct structure can be used instead.
Statistical aggregate todo
Relational union done
Relational intersect todo
Relational symdiff todo
Relational setdiff done
Relational merge todo
Boolean and usable Only inside join expression (no lifting).
Boolean or usable Only inside join expression (no lifting).
Boolean xor usable Only inside join expression (no lifting).
Boolean not usable Only inside join expression (no lifting).
Mathematical unary plus and minus done
Mathematical addition, substraction done
Mathematical multiplication, division done
Mathematical round, ceil, floor done
Mathematical abs done
Mathematical trunc done
Mathematical power, exp, nroot done
Mathematical ln, log done
Mathematical mod done
String length todo
String concatenation done
String trim todo
String upper/lower case todo
String substr usable No lifting.
String indexof todo
String date_from_string usable Dataset as input not implemented. Only YYYY date format accepted.
Outside specification integer_from_string done
Outside specification float_from_string done
Outside specification string_from_number done

Analytics