From c10f177d4516dafa3393dd93a3a38f5bca0f76bc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Gabriel=20N=C3=BCtzi?= Date: Tue, 25 Jun 2024 17:31:50 +0200 Subject: [PATCH] doc: Format markdown --- README.md | 78 +++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 56 insertions(+), 22 deletions(-) diff --git a/README.md b/README.md index f84e3b6..af04dab 100644 --- a/README.md +++ b/README.md @@ -2,14 +2,21 @@ A simple Rust CLI tool to protect sensitive values in [RDF triples](https://en.wikipedia.org/wiki/Semantic_triple) through -[pseudonymization](https://en.wikipedia.org/wiki/Pseudonymization). The goal is to offer a fast, secure and memory-efficient pseudonymization solution to any RDF graph. +[pseudonymization](https://en.wikipedia.org/wiki/Pseudonymization). The goal is +to offer a fast, secure and memory-efficient pseudonymization solution to any +RDF graph. -Note: code is still in development and we support only [NTriples format](https://en.wikipedia.org/wiki/N-Triples) as input. +Note: code is still in development and we support only +[NTriples format](https://en.wikipedia.org/wiki/N-Triples) as input. The tool works in two steps: - 1. Indexing to create a reference to all [rdf:type](https://www.w3.org/TR/rdf12-schema/#ch_type) instances in the graph - 2. Pseudonymization to encrypt or hash sensitive parts of any RDF triple in the graph via a human-readable configuration file and the previously generated index +1. Indexing to create a reference to all + [rdf:type](https://www.w3.org/TR/rdf12-schema/#ch_type) instances in the + graph +2. Pseudonymization to encrypt or hash sensitive parts of any RDF triple in the + graph via a human-readable configuration file and the previously generated + index
Table of Content @@ -32,7 +39,8 @@ The tool works in two steps: ## Installation & Usage -The package must be compiled from source using [cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html): +The package must be compiled from source using +[cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html): ```shell git clone https://github.com/sdsc-ordes/rdf-protect @@ -43,10 +51,11 @@ cargo build --release ### Usage -The general command-line interface outlines the two main steps of the tool, indexing and pseudonymization: +The general command-line interface outlines the two main steps of the tool, +indexing and pseudonymization: ```shell -$ rdf-protect --help +$ rdf-protect --help A tool to pseudonymize URIs and values in RDF graphs. Usage: rdf-protect @@ -64,7 +73,7 @@ Options: Indexing only requires an RDF file as input: ```shell -$ rdf-protect index --help +$ rdf-protect index --help 1. Pass: Create a node-to-type index from input triples Usage: rdf-protect index [OPTIONS] [INPUT] @@ -95,18 +104,24 @@ Options: -h, --help Print help ``` -In both subcommands, the input defaults to stdin and the output to stdout, allowing to pipe both up- and downstream `rdf-protect` (see next section). +In both subcommands, the input defaults to stdin and the output to stdout, +allowing to pipe both up- and downstream `rdf-protect` (see next section). ### Use Case -The main idea behind `rdf-protect` is to integrate smoothly into other CLI tools up- and downstream via piping. -Let us assume that we're running a SPARQL query on a large graph and we would like to pseudonymize some of the triples. This is how the flow should look like: +The main idea behind `rdf-protect` is to integrate smoothly into other CLI tools +up- and downstream via piping. Let us assume that we're running a SPARQL query +on a large graph and we would like to pseudonymize some of the triples. This is +how the flow should look like: ```shell curl | rdf-protect -i index -c config.yaml | pseudo.nt ``` -For this flow to stream data instead of loading everything into memory, we had to include an indexing step to make the streaming process consistent and easier to control. It is not as clean as having one command doing everything, but it simplifies code development. +For this flow to stream data instead of loading everything into memory, we had +to include an indexing step to make the streaming process consistent and easier +to control. It is not as clean as having one command doing everything, but it +simplifies code development. ### Example @@ -116,7 +131,8 @@ There are three possible ways to pseudonymize RDF triples: 2. Pseudonymize values for specific subject-predicate combinations. 3. Pseudonymize any value for a given predicate. -By using all three ways together, we're able to get an RDF file with sensitive information: +By using all three ways together, we're able to get an RDF file with sensitive +information: ```ntriples . @@ -128,7 +144,9 @@ By using all three ways together, we're able to get an RDF file with sensitive i . "Bank" . ``` -And pseudonymize the sensitive information such as people's names, personal and secret information while keeping the rest as is: + +And pseudonymize the sensitive information such as people's names, personal and +secret information while keeping the rest as is: ``` . @@ -141,23 +159,31 @@ And pseudonymize the sensitive information such as people's names, personal and "Bank" . ``` -The next subsections break down each of the three pseudonymization approaches to better understand how they operate. +The next subsections break down each of the three pseudonymization approaches to +better understand how they operate. #### 1. Pseudonymize the URI of nodes with `rdf:type` Given the following config: + ```yaml replace_uri_of_nodes_with_type: - - "http://xmlns.com/foaf/0.1/Person" + - "http://xmlns.com/foaf/0.1/Person" ``` -The goal is to pseudonymize all instaces of `rdf:type` Person. The following input file: + +The goal is to pseudonymize all instaces of `rdf:type` Person. The following +input file: + ``` . ``` + Would become: + ``` . ``` + #### 2. Pseudonymize values for specific subject-predicate combinations Given the following config: @@ -165,16 +191,21 @@ Given the following config: ```yaml replace_values_of_subject_predicate: "http://xmlns.com/foaf/0.1/Person": - - "http://schema.org/name" + - "http://schema.org/name" ``` -The goal is to pseudonymize only the instances of names when they're associated to Person. The following input file: + +The goal is to pseudonymize only the instances of names when they're associated +to Person. The following input file: + ``` . "Alice" . . "Bank" . ``` + Would become: + ``` . "af321bbc" . @@ -185,20 +216,24 @@ Would become: #### 3. Pseudonymize any value for a given predicate Given the following config: + ```yaml replace_value_of_predicate: - "http://schema.org/name" ``` -The goal is to pseudonymize any values associated to name. -The following input file: +The goal is to pseudonymize any values associated to name. The following input +file: + ``` . "Alice" . . "Bank" . ``` + Would become: + ``` . "af321bbc" . @@ -206,7 +241,6 @@ Would become: "38a3dd71" . ``` - ## Development ### Setup