Skip to content

Commit

Permalink
doc: Update Readme (#21)
Browse files Browse the repository at this point in the history
Co-authored-by: Gabriel Nützi <[email protected]>
  • Loading branch information
supermaxiste and gabyx authored Jun 25, 2024
1 parent d440e34 commit 2460466
Showing 1 changed file with 181 additions and 3 deletions.
184 changes: 181 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,14 @@

A simple Rust CLI tool to protect sensitive values in
[RDF triples](https://en.wikipedia.org/wiki/Semantic_triple) through
[pseudonymization](https://en.wikipedia.org/wiki/Pseudonymization).
[pseudonymization](https://en.wikipedia.org/wiki/Pseudonymization). The goal is to offer a fast, secure and memory-efficient pseudonymization solution to any RDF graph.

Note: code is still in development and we support only [NTriples format](https://en.wikipedia.org/wiki/N-Triples) as input.

The tool works in two steps:

1. Indexing to create a reference to all [rdf:type](https://www.w3.org/TR/rdf12-schema/#ch_type) instances in the graph
2. Pseudonymization to encrypt or hash sensitive parts of any RDF triple in the graph via a human-readable configuration file and the previously generated index

<details>
<summary>Table of Content</summary>
Expand All @@ -11,6 +18,8 @@ A simple Rust CLI tool to protect sensitive values in
- [RDF Protect](#rdf-protect)
- [Installation & Usage](#installation-usage)
- [Usage](#usage)
- [Use Case](#use-cases)
- [Example](#example)
- [Development](#development)
- [Requirements](#requirements)
- [Nix](#nix)
Expand All @@ -23,11 +32,180 @@ A simple Rust CLI tool to protect sensitive values in

## Installation & Usage

TODO
The package must be compiled from source using [cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html):

```shell
git clone https://github.com/sdsc-ordes/rdf-protect
cd rdf-protect
cargo build --release
# executable binary located in ./target/release/rdf-protect
```

### Usage

TODO
The general command-line interface outlines the two main steps of the tool, indexing and pseudonymization:

```shell
$ rdf-protect --help
A tool to pseudonymize URIs and values in RDF graphs.

Usage: rdf-protect <COMMAND>

Commands:
index 1. Pass: Create a node-to-type index from input triples
pseudo 2. Pass: Pseudonymize input triples
help Print this message or the help of the given subcommand(s)

Options:
-h, --help Print help
-V, --version Print version
```

Indexing only requires an RDF file as input:

```shell
$ rdf-protect index --help
1. Pass: Create a node-to-type index from input triples

Usage: rdf-protect index [OPTIONS] [INPUT]

Arguments:
[INPUT] File descriptor to read triples from. Defaults to `stdin` [default: -]

Options:
-o, --output <OUTPUT> Output file descriptor to for the node-to-type index [default: -]
-h, --help Print help
```

Pseudonomyzation requires an RDF file, index and config as input:

```shell
$ rdf-protect pseudo --help
2. Pass: Pseudonymize input triples

Usage: rdf-protect pseudo [OPTIONS] --index <INDEX> --config <CONFIG> [INPUT]

Arguments:
[INPUT] File descriptor to read input triples from. Defaults to `stdin` [default: -]

Options:
-i, --index <INDEX> Index file produced by prepare-index. Required for pseudonymization
-c, --config <CONFIG> The config file descriptor to use for defining RDF elements to pseudonymize. Format: yaml
-o, --output <OUTPUT> Output file descriptor for pseudonymized triples. Defaults to `stdout` [default: -]
-h, --help Print help
```

In both subcommands, the input defaults to stdin and the output to stdout, allowing to pipe both up- and downstream `rdf-protect` (see next section).

### Use Case

The main idea behind `rdf-protect` is to integrate smoothly into other CLI tools up- and downstream via piping.
Let us assume that we're running a SPARQL query on a large graph and we would like to pseudonymize some of the triples. This is how the flow should look like:

```shell
curl <sparql-query> | rdf-protect -i index -c config.yaml | pseudo.nt
```

For this flow to stream data instead of loading everything into memory, we had to include an indexing step to make the streaming process consistent and easier to control. It is not as clean as having one command doing everything, but it simplifies code development.

### Example

There are three possible ways to pseudonymize RDF triples:

1. Pseudonymize the URI of nodes with `rdf:type`.
2. Pseudonymize values for specific subject-predicate combinations.
3. Pseudonymize any value for a given predicate.

By using all three ways together, we're able to get an RDF file with sensitive information:

```ntriples
<http://example.org/Alice> <http://www.w3.org/2000/01/rdf-schema#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.org/Alice> <http://xmlns.com/foaf/0.1/holdsAccount> <http://example.org/Alice-Bank-Account> .
<http://example.org/Alice-Bank-Account> <http://www.w3.org/2000/01/rdf-schema#type> <http://xmlns.com/foaf/OnlineAccount> .
<http://example.org/Alice-Bank-Account> <http://schema.org/name> "my_account32" .
<http://example.org/Alice-Bank-Account> <http://schema.org/accessCode> "secret-123" .
<http://example.org/Alice> <http://schema.org/name> "Alice" .
<http://example.org/Bank> <http://www.w3.org/2000/01/rdf-schema#type> <http://xmlns.com/foaf/0.1/Organization> .
<http://example.org/Bank> <http://schema.org/name> "Bank" .
```
And pseudonymize the sensitive information such as people's names, personal and secret information while keeping the rest as is:

```
<http://example.org/af321bbc> <http://www.w3.org/2000/01/rdf-schema#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.org/af321bbc> <http://xmlns.com/foaf/0.1/holdsAccount> <http://example.org/bs2313bc> .
<http://example.org/bs2313bc> <http://www.w3.org/2000/01/rdf-schema#type> <http://xmlns.com/foaf/OnlineAccount> .
<http://example.org/bs2313bc> <http://schema.org/name> "pp54r32" .
<http://example.org/bs2313bc> <http://schema.org/accessCode> "asfnd223" .
<http://example.org/af321bbc> <http://schema.org/name> "af321bbc" .
<http://example.org/Bank> <http://www.w3.org/2000/01/rdf-schema#type> <http://xmlns.com/foaf/0.1/Organization> .
<http://example.org/Bank> <http://schema.org/name> "Bank" .
```

The next subsections break down each of the three pseudonymization approaches to better understand how they operate.

#### 1. Pseudonymize the URI of nodes with `rdf:type`

Given the following config:
```yaml
replace_uri_of_nodes_with_type:
- "http://xmlns.com/foaf/0.1/Person"
```
The goal is to pseudonymize all instaces of `rdf:type` Person. The following input file:
```
<http://example.org/Alice> <http://www.w3.org/2000/01/rdf-schema#type> <http://xmlns.com/foaf/0.1/Person> .
```
Would become:
```
<http://example.org/af321bbc> <http://www.w3.org/2000/01/rdf-schema#type> <http://xmlns.com/foaf/0.1/Person> .
```
#### 2. Pseudonymize values for specific subject-predicate combinations
Given the following config:
```yaml
replace_values_of_subject_predicate:
"http://xmlns.com/foaf/0.1/Person":
- "http://schema.org/name"
```
The goal is to pseudonymize only the instances of names when they're associated to Person. The following input file:
```
<http://example.org/Alice> <http://www.w3.org/2000/01/rdf-schema#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.org/Alice> <http://schema.org/name> "Alice" .
<http://example.org/Bank> <http://www.w3.org/2000/01/rdf-schema#type> <http://xmlns.com/foaf/0.1/Organization> .
<http://example.org/Bank> <http://schema.org/name> "Bank" .
```
Would become:
```
<http://example.org/Alice> <http://www.w3.org/2000/01/rdf-schema#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.org/Alice> <http://schema.org/name> "af321bbc" .
<http://example.org/Bank> <http://www.w3.org/2000/01/rdf-schema#type> <http://xmlns.com/foaf/0.1/Organization> .
<http://example.org/Bank> <http://schema.org/name> "Bank" .
```

#### 3. Pseudonymize any value for a given predicate

Given the following config:
```yaml
replace_value_of_predicate:
- "http://schema.org/name"
```
The goal is to pseudonymize any values associated to name.
The following input file:
```
<http://example.org/Alice> <http://www.w3.org/2000/01/rdf-schema#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.org/Alice> <http://schema.org/name> "Alice" .
<http://example.org/Bank> <http://www.w3.org/2000/01/rdf-schema#type> <http://xmlns.com/foaf/0.1/Organization> .
<http://example.org/Bank> <http://schema.org/name> "Bank" .
```
Would become:
```
<http://example.org/Alice> <http://www.w3.org/2000/01/rdf-schema#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.org/Alice> <http://schema.org/name> "af321bbc" .
<http://example.org/Bank> <http://www.w3.org/2000/01/rdf-schema#type> <http://xmlns.com/foaf/0.1/Organization> .
<http://example.org/Bank> <http://schema.org/name> "38a3dd71" .
```


## Development

Expand Down

0 comments on commit 2460466

Please sign in to comment.