Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs #362

Merged
merged 7 commits into from
Oct 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Transformation engine and validator for statistics.
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Mentioned in Awesome Official Statistics ](https://awesome.re/mentioned-badge.svg)](http://www.awesomeofficialstatistics.org)

Trevas is a Java engine for the Validation and Transformation Language (VTL), an [SDMX standard](https://sdmx.org/?page_id=5096) that allows the formal definition of algorithms to validate statistical data and calculate derived data. VTL is user oriented and provides a technology-neutral and standard view of statistical processes at the business level. Trevas supports the latest VTL version (v2.0, July 2020).
Trevas is a Java engine for the Validation and Transformation Language (VTL), an [SDMX standard](https://sdmx.org/?page_id=5096) that allows the formal definition of algorithms to validate statistical data and calculate derived data. VTL is user oriented and provides a technology-neutral and standard view of statistical processes at the business level. Trevas supports the latest VTL version (v2.1, July 2024).

For actual execution, VTL expressions need to be translated to the target runtime environment. Trevas provides this step for the Java platform, by using the VTL formal grammar and the [Antlr](https://www.antlr.org/) tool. For a given execution, Trevas receives the VTL expression and the data bindings that associate variable names in the expression to actual data sets. The execution results can then be retrieved from the bindings for further treatments.

Expand All @@ -32,6 +32,12 @@ Open JDK 8+ is required.

## References

<p align="center">
<img width="100px" src="./docs/static/img/sdmx-logo.svg" />
</p>

Trevas is listed among the [SDMX](https://sdmx.org/?page_id=4500) tools.

<p align="center">
<img width="100px" src="./docs/static/img/sdmx-io-logo.svg" />
</p>
Expand All @@ -42,4 +48,4 @@ Trevas is part of the [sdmx.io](https://www.sdmx.io/) ecosystem.
<img src="https://awesome.re/mentioned-badge.svg" />
</p>

Trevas is referencing by [_Awesome official statistics software_](https://github.com/SNStatComp/awesome-official-statistics-software)
Trevas is referenced by [_Awesome official statistics software_](https://github.com/SNStatComp/awesome-official-statistics-software)
239 changes: 239 additions & 0 deletions docs/blog/2024-10-07-trevas-provenance.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
---
slug: /trevas-provenance
title: Trevas - Provenance
authors: [nicolas]
tags: [Trevas, provenance, SDTH]
---

import useBaseUrl from '@docusaurus/useBaseUrl';
import Link from '@theme/Link';

### News

Trevas 1.6.0 introduces the VTL Prov module.

This module enables to produce lineage metadata from Trevas, based on RDF ontologies: `PROV-O` and `SDTH`.

#### SDTH model overview

```mermaid
classDiagram

class Program["sdth:Program"] {
rdfs:label
}
class ProgramStep["sdth:ProgramStep"] {
rdfs:label
sdth:hasSourceCode
sdth:hasSDTL
}
class VariableInstance["sdth:VariableInstance"] {
rdfs:label
sdth:hasName
}
class DataframeInstance["sdth:DataframeInstance"] {
rdfs:label
sdth:hasName
}

class FileInstance["sdth:FileInstance"] {
rdfs:label
sdth:hasName
}


ProgramStep <-- Program : sdthhasProgramStep
ProgramStep <-- ProgramStep : sdth_hasProgramStep

ProgramStep --> VariableInstance : sdth_usesVariable
ProgramStep --> VariableInstance : sdth_assignsVariable
ProgramStep --> DataframeInstance : sdth_consumesDataframe
ProgramStep --> DataframeInstance : sdth_producesDataframe

ProgramStep --> FileInstance : sdth_loadsFile
ProgramStep --> FileInstance : sdth_savesFile


DataframeInstance --> VariableInstance : sdth_hasVariableInstance
FileInstance --> VariableInstance : sdth_hasVariableInstance


DataframeInstance --> DataframeInstance : sdth_derivedFrom
DataframeInstance --> DataframeInstance : sdth_elaborationOf

FileInstance --> FileInstance : sdth_derivedFrom
FileInstance --> FileInstance : sdth_elaborationOf
VariableInstance --> VariableInstance : sdth_derivedFrom
VariableInstance --> VariableInstance : sdth_elaborationOf
```

#### Adopted model

The `vtl-prov` module, version 1.6.0, uses the following partial model:

```mermaid
classDiagram
class Agent {
}
class Program {
rdfs:label
}
class ProgramStep {
rdfs:label
}
class VariableInstance {
rdfs:label
sdth:hasName
}
class DataframeInstance {
rdfs:label
sdth:hasName
}

Agent <|-- Program
ProgramStep <-- Program : sdth_hasProgramStep
ProgramStep --> VariableInstance : sdth_usesVariable
ProgramStep --> VariableInstance : sdth_assignsVariable
ProgramStep --> DataframeInstance : sdth_consumesDataframe
ProgramStep --> DataframeInstance : sdth_producesDataframe
DataframeInstance --> VariableInstance : sdth_hasVariableInstance
DataframeInstance --> DataframeInstance : sdth_wasDerivedFrom
VariableInstance --> VariableInstance : sdth_wasDerivedFrom
```

Improvements will come in next weeks.

#### Tools available

Provenance Trevas tools are documented <Link label={"here"} href={useBaseUrl('/developer-guide/spark-mode/data-sources/sdmx')} />.

#### Example

##### Business use case

Two sources datasets are transformed to produce transient datasets and a final permanent one.

```mermaid
flowchart TD
OP1{add +}
OP2{multiply *}
OP3{filter}
OP4{create variable}
SC3([3])

ds_1 --> OP1
ds_2 --> OP1
OP1 --> ds_sum
SC3 --> OP2
ds_sum --> OP2
OP2 --> ds_mul
ds_mul --> OP3
OP3 --> OP4
OP4 --> ds_res
```

### Inputs

`ds1` & `ds2` metadata:

| id | var1 | var2 |
| :--------: | :-----: | :-----: |
| STRING | INTEGER | NUMBER |
| IDENTIFIER | MEASURE | MEASURE |

### VTL script

```vtl
ds_sum := ds1 + ds2;
ds_mul := ds_sum * 3;
ds_res <- ds_mul[filter mod(var1, 2) = 0][calc var_sum := var1 + var2];
```

### RDF model target

```ttl
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX sdth: <http://rdf-vocabulary.ddialliance.org/sdth#>

# --- Program and steps
<http://example.com/program1> a sdth:Program ;
a prov:Agent ; # Agent? Or an activity
rdfs:label "My program 1"@en, "Mon programme 1"@fr ;
sdth:hasProgramStep <http://example.com/program1/program-step1>,
<http://example.com/program1/program-step2>,
<http://example.com/program1/program-step3> .

<http://example.com/program1/program-step1> a sdth:ProgramStep ;
rdfs:label "Program step 1"@en, "Étape 1"@fr ;
sdth:hasSourceCode "ds_sum := ds1 + ds2;" ;
sdth:consumesDataframe <http://example.com/dataset/ds1>,
<http://example.com/dataset/ds2> ;
sdth:producesDataframe <http://example.com/dataset/ds_sum> .

<http://example.com/program1/program-step2> a sdth:ProgramStep ;
rdfs:label "Program step 2"@en, "Étape 2"@fr ;
sdth:hasSourceCode "ds_mul := ds_sum * 3;" ;
sdth:consumesDataframe <http://example.com/dataset/ds_sum> ;
sdth:producesDataframe <http://example.com/dataset/ds_mul> .

<http://example.com/program1/program-step3> a sdth:ProgramStep ;
rdfs:label "Program step 3"@en, "Étape 3"@fr ;
sdth:hasSourceCode "ds_res <- ds_mul[filter mod(var1, 2) = 0][calc var_sum := var1 + var2];" ;
sdth:consumesDataframe <http://example.com/dataset/ds_mul> ;
sdth:producesDataframe <http://example.com/dataset/ds_res> ;
sdth:usesVariable <http://example.com/variable/var1>,
<http://example.com/variable/var2> ;
sdth:assignsVariable <http://example.com/variable/var_sum> .

# --- Variables
# i think here it's not instances but names we refer to...
<http://example.com/variable/id1> a sdth:VariableInstance ;
rdfs:label "id1" .
<http://example.com/variable/var1> a sdth:VariableInstance ;
rdfs:label "var1" .
<http://example.com/variable/var2> a sdth:VariableInstance ;
rdfs:label "var2" .
<http://example.com/variable/var_sum> a sdth:VariableInstance ;
rdfs:label "var_sum" .

# --- Data frames
<http://example.com/dataset/ds1> a sdth:DataframeInstance ;
rdfs:label "ds1" ;
sdth:hasName "ds1" ;
sdth:hasVariableInstance <http://example.com/variable/id1>,
<http://example.com/variable/var1>,
<http://example.com/variable/var2> .

<http://example.com/dataset/ds2> a sdth:DataframeInstance ;
rdfs:label "ds2" ;
sdth:hasName "ds2" ;
sdth:hasVariableInstance <http://example.com/variable/id1>,
<http://example.com/variable/var1>,
<http://example.com/variable/var2> .

<http://example.com/dataset/ds_sum> a sdth:DataframeInstance ;
rdfs:label "ds_sum" ;
sdth:hasName "ds_sum" ;
sdth:wasDerivedFrom <http://example.com/dataset/ds1>,
<http://example.com/dataset/ds2> ;
sdth:hasVariableInstance <http://example.com/variable/id1>,
<http://example.com/variable/var1>,
<http://example.com/variable/var2> .

<http://example.com/dataset/ds_mul> a sdth:DataframeInstance ;
rdfs:label "ds_mul" ;
sdth:hasName "ds_mul" ;
sdth:wasDerivedFrom <http://example.com/dataset/ds_sum> ;
sdth:hasVariableInstance <http://example.com/variable/id1>,
<http://example.com/variable/var1>,
<http://example.com/variable/var2> .

<http://example.com/dataset/ds_res> a sdth:DataframeInstance ;
rdfs:label "ds_res" ;
sdth:wasDerivedFrom <http://example.com/dataset/ds_mul> ;
sdth:hasVariableInstance <http://example.com/variable/id1>,
<http://example.com/variable/var1>,
<http://example.com/variable/var2>,
<http://example.com/variable/var_sum> .
```
28 changes: 28 additions & 0 deletions docs/blog/2024-10-09-trevas-vtl-21.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
slug: /trevas-vtl-21
title: Trevas - VTL 2.1
authors: [nicolas]
tags: [Trevas, 'VTL 2.1']
---

import useBaseUrl from '@docusaurus/useBaseUrl';
import Link from '@theme/Link';

Trevas 1.7.0 upgrade to version 2.1 of VTL.

This version introduces two new operators:

- `random`
- `case`

`random` produces a decimal number between 0 and 1.

`case` allows for clearer multi conditional branching, for example:

`ds2 := ds1[ calc c := case when r < 0.2 then "Low" when r > 0.8 then "High" else "Medium" ]`

Both operators are already available in Trevas!

The new grammar also provides time operators and includes corrections, without any breaking changes compared to the 2.0 version.

See the <Link label={"coverage"} href={useBaseUrl('/user-guide/coverage')} /> section for more details.
2 changes: 1 addition & 1 deletion docs/docs/developer-guide/basic-mode/data-sources/jdbc.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ custom_edit_url: null
<dependency>
<groupId>fr.insee.trevas</groupId>
<artifactId>vtl-jdbc</artifactId>
<version>1.5.0</version>
<version>1.7.0</version>
</dependency>
```

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/developer-guide/basic-mode/data-sources/json.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ custom_edit_url: null
<dependency>
<groupId>fr.insee.trevas</groupId>
<artifactId>vtl-jackson</artifactId>
<version>1.5.0</version>
<version>1.7.0</version>
</dependency>
```

Expand Down
10 changes: 9 additions & 1 deletion docs/docs/developer-guide/index-developer-guide.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ import Card from '@theme/Card';
<dependency>
<groupId>fr.insee.trevas</groupId>
<artifactId>vtl-engine</artifactId>
<version>1.5.0</version>
<version>1.7.0</version>
</dependency>
```

Expand Down Expand Up @@ -64,3 +64,11 @@ PersistentDataset result = (PersistentDataset) engine.getBindings(ScriptContext.
<Card title="Spark mode" page={useBaseUrl('/developer-guide/spark-mode')} />
</div>
</div>

### Provenance

<div className="row">
<div className="col">
<Card title="Provenance" page={useBaseUrl('/developer-guide/provenance')} />
</div>
</div>
Loading
Loading