From f1bfa08b4bae9b69f55df977916cd9a65579c1f6 Mon Sep 17 00:00:00 2001 From: Cyril Matthey-Doret Date: Thu, 11 Jul 2024 19:25:06 +0200 Subject: [PATCH] docs(readme): split guidelines (#30) --- README.md | 281 ++------------------------------------ docs/development_guide.md | 112 +++++++++++++++ docs/tutorial.md | 159 +++++++++++++++++++++ 3 files changed, 279 insertions(+), 273 deletions(-) create mode 100644 docs/development_guide.md create mode 100644 docs/tutorial.md diff --git a/README.md b/README.md index 606f623..17df9bb 100644 --- a/README.md +++ b/README.md @@ -10,8 +10,8 @@

Current Release label - - Test Status label + + Test Status label License label

@@ -54,7 +54,7 @@ The tool works in two steps: -## Installation & Usage +## Installation The package must be compiled from source using [cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html): @@ -66,7 +66,7 @@ cargo build --release # executable binary located in ./target/release/tripsu ``` -### Usage +## Usage The general command-line interface outlines the two main steps of the tool, indexing and pseudonymization: @@ -104,281 +104,16 @@ Pseudonomyzation requires an RDF file, index and config as input: tripsu pseudo --index index.nt --config rules.yaml input.nt > output.nt ``` -> [!TIP] For each subcommand, you can use `--help` to see all options. +> [!TIP] +> For each subcommand, you can use `--help` to see all options. In both subcommands, the input defaults to stdin and the output to stdout, allowing to pipe both up- and downstream `tripsu` (see next section). -### Use Case - -The main idea behind `tripsu` is to integrate smoothly into other CLI tools up- -and downstream via piping. Let us assume that we're running a SPARQL query on a -large graph and we would like to pseudonymize some of the triples. This is how -the flow should look like: - -```shell -curl | tripsu pseudo -i index.nt -c config.yaml > pseudo.nt -``` - -For this flow to stream data instead of loading everything into memory, we had -to include an indexing step to make the streaming process consistent and easier -to control. It is not as clean as having one command doing everything, but it -simplifies code development. - -### Example - -There are three possible ways to pseudonymize RDF triples: - -1. Pseudonymize the URI of nodes with `rdf:type`. -2. Pseudonymize values for specific subject-predicate combinations. -3. Pseudonymize any value for a given predicate. - -By using all three ways together, we're able to get an RDF file with sensitive -information: - -
- Click to show input - -```ntriples - . - . - . - "my_account32" . - "secret-123" . - "Alice" . - . - "Bank" . -``` - -
- -And pseudonymize the sensitive information such as people's names, personal and -secret information while keeping the rest as is: - -
- Click to show output - -``` - . - . - . - "pp54r32" . - "asfnd223" . - "af321bbc" . - . - "Bank" . -``` - -
- -The next subsections break down each of the three pseudonymization approaches to -better understand how they operate. - -#### 1. Pseudonymize the URI of nodes with `rdf:type` - -
- Click to show - -Given the following config: - -```yaml -replace_uri_of_nodes_with_type: - - "http://xmlns.com/foaf/0.1/Person" -``` - -The goal is to pseudonymize all instaces of `rdf:type` Person. The following -input file: - -``` - . -``` - -Would become: - -``` - . -``` - -
- -#### 2. Pseudonymize values for specific subject-predicate combinations - -
- Click to show - -Given the following config: - -```yaml -replace_values_of_subject_predicate: - "http://xmlns.com/foaf/0.1/Person": - - "http://schema.org/name" -``` - -The goal is to pseudonymize only the instances of names when they're associated -to Person. The following input file: - -``` - . - "Alice" . - . - "Bank" . -``` - -Would become: - -``` - . - "af321bbc" . - . - "Bank" . -``` - -
- -#### 3. Pseudonymize any value for a given predicate - -
- Click to show - -Given the following config: - -```yaml -replace_value_of_predicate: - - "http://schema.org/name" -``` - -The goal is to pseudonymize any values associated to name. The following input -file: - -``` - . - "Alice" . - . - "Bank" . -``` - -Would become: - -``` - . - "af321bbc" . - . - "38a3dd71" . -``` - -
+For more information about use-cases and configuration, see the [tutorial](docs/tutorial.md). ## Development Read first the [Contribution Guidelines](/CONTRIBUTING.md). -### Setup - -- Rust Toolchain: You need the `rust` toolchain corresponding to - [`rust-toochain.md`](./rust-toochain.md) installed. Install Rust with - [`rust-up`](https://rustup.rs) and any `cargo` invocations will then - automatically respect the [toolchain](./rust-toolchain.md). - -- Command runner [`just`](https://github.com/casey/just). - -- The Cargo plugin [`cargo-watch`](https://crates.io/crates/cargo-watch) for - continuous building. - -- Container manager such as [`podman`](https://podman.io), - [`docker`](https://docker.com) for some tooling (formatting etc.). - -### Development Shell with `nix` - -If you have the package manager -[`nix`](https://github.com/DeterminateSystems/nix-installer) installed you can -enter a development setup easily with - -```shell -nix ./tools/nix#default -``` - -or `just nix-develop` or automatically when [`direnv`](https://direnv.net) is -installed and [setup for your shell](https://direnv.net/docs/hook.html) and -`direnv allow` was executed inside the repository. - -**Note:** Make sure to enable `flakes` and `nix-command` in -[your `nix` config](https://nixos.wiki/wiki/Flakes#Other_Distros,_without_Home-Manager) - -### Formatting - -To format the whole project run - -```shell -just format -``` - -**Note:** If you use `docker`, use `just container_mgr=docker format` - -### Building - -To build the tool with `cargo` run - -```shell -just build -``` - -and for continuous building (needs): - -```shell -just watch -``` - -### Testing - -To run the tests do - -```shell -just test -``` - -### Build the Package & Image - -To build the package with Nix run: - -```shell -just nix-package -``` - -To build the image with Nix run: - -```shell -just nix-image -``` - -### Upload CI Images - -CI is run with some container images which can be updated with: - -```shell -just upload-ci-images [] [] -``` - -where the `` should be a semantic version. **Note: By default it will -upload and overwrite the current version.** - -### Prepare a Release - -To prepare a release you can execute: - -```shell -just release -``` - -It will: - -- Check that the version is semantic version and the version does not exists - (local and remote) and it is newer then all remote version. - -- Update the `Cargo.toml` and make a commit on `main`. - -- Push a prepare tag `prepare-v` which triggers the - [`release.yaml`](.github/workflows/release.yaml) pipeline. - -**Note: If the release pipeline fails, you can just run this same command again. -Also rerun it when you made a mistake, it will cancel the current release (works -also when `--amend`ing on the current commit)** +For technical documentation on setup and development, see the [Development Guide](docs/development_guide.md) diff --git a/docs/development_guide.md b/docs/development_guide.md new file mode 100644 index 0000000..4cb33f9 --- /dev/null +++ b/docs/development_guide.md @@ -0,0 +1,112 @@ +# Development guide + +## Setup + +- Rust Toolchain: You need the `rust` toolchain corresponding to + [`rust-toochain.md`](./rust-toochain.md) installed. Install Rust with + [`rust-up`](https://rustup.rs) and any `cargo` invocations will then + automatically respect the [toolchain](./rust-toolchain.md). + +- Command runner [`just`](https://github.com/casey/just). + +- The Cargo plugin [`cargo-watch`](https://crates.io/crates/cargo-watch) for + continuous building. + +- Container manager such as [`podman`](https://podman.io), + [`docker`](https://docker.com) for some tooling (formatting etc.). + +## Development Shell with `nix` + +If you have the package manager +[`nix`](https://github.com/DeterminateSystems/nix-installer) installed you can +enter a development setup easily with + +```shell +nix ./tools/nix#default +``` + +or `just nix-develop` or automatically when [`direnv`](https://direnv.net) is +installed and [setup for your shell](https://direnv.net/docs/hook.html) and +`direnv allow` was executed inside the repository. + +**Note:** Make sure to enable `flakes` and `nix-command` in +[your `nix` config](https://nixos.wiki/wiki/Flakes#Other_Distros,_without_Home-Manager) + +## Formatting + +To format the whole project run + +```shell +just format +``` + +**Note:** If you use `docker`, use `just container_mgr=docker format` + +## Building + +To build the tool with `cargo` run + +```shell +just build +``` + +and for continuous building (needs): + +```shell +just watch +``` + +## Testing + +To run the tests do + +```shell +just test +``` + +## Build the Package & Image + +To build the package with Nix run: + +```shell +just nix-package +``` + +To build the image with Nix run: + +```shell +just nix-image +``` + +## Upload CI Images + +CI is run with some container images which can be updated with: + +```shell +just upload-ci-images [] [] +``` + +where the `` should be a semantic version. **Note: By default it will +upload and overwrite the current version.** + +## Prepare a Release + +To prepare a release you can execute: + +```shell +just release +``` + +It will: + +- Check that the version is semantic version and the version does not exists + (local and remote) and it is newer then all remote version. + +- Update the `Cargo.toml` and make a commit on `main`. + +- Push a prepare tag `prepare-v` which triggers the + [`release.yaml`](.github/workflows/release.yaml) pipeline. + +**Note: If the release pipeline fails, you can just run this same command again. +Also rerun it when you made a mistake, it will cancel the current release (works +also when `--amend`ing on the current commit)** diff --git a/docs/tutorial.md b/docs/tutorial.md new file mode 100644 index 0000000..ccbabd3 --- /dev/null +++ b/docs/tutorial.md @@ -0,0 +1,159 @@ +# Tutorial + +## Motivation + +The main use-case for `tripsu` is to be used in combination with other CLI tools up- +and downstream via piping. Let us assume that we're running a SPARQL query on a +large graph and we would like to pseudonymize some of the triples. This is how +the flow should look like: + +```shell +curl | tripsu pseudo -i index.nt -c config.yaml > pseudo.nt +``` + +For this flow to stream data instead of loading everything into memory, note that +an indexing step is required to allow the pseudonymization to run on a stream without +loading the graph into memory. + +## Example + +There are three possible ways to pseudonymize RDF triples: + +1. Pseudonymize the URI of nodes with `rdf:type`. +2. Pseudonymize values for specific subject-predicate combinations. +3. Pseudonymize any value for a given predicate. + +By using all three ways together, we're able to get an RDF file with sensitive +information: + +
+ Click to show input + +```ntriples + . + . + . + "my_account32" . + "secret-123" . + "Alice" . + . + "Bank" . +``` + +
+ +And pseudonymize the sensitive information such as people's names, personal and +secret information while keeping the rest as is: + +
+ Click to show output + +``` + . + . + . + "pp54r32" . + "asfnd223" . + "af321bbc" . + . + "Bank" . +``` + +
+ +The next subsections break down each of the three pseudonymization approaches to +better understand how they operate. + +### 1. Pseudonymize the URI of nodes with `rdf:type` + +
+ Click to show + +Given the following config: + +```yaml +replace_uri_of_nodes_with_type: + - "http://xmlns.com/foaf/0.1/Person" +``` + +The goal is to pseudonymize all instaces of `rdf:type` Person. The following +input file: + +``` + . +``` + +Would become: + +``` + . +``` + +
+ +### 2. Pseudonymize values for specific subject-predicate combinations + +
+ Click to show + +Given the following config: + +```yaml +replace_values_of_subject_predicate: + "http://xmlns.com/foaf/0.1/Person": + - "http://schema.org/name" +``` + +The goal is to pseudonymize only the instances of names when they're associated +to Person. The following input file: + +``` + . + "Alice" . + . + "Bank" . +``` + +Would become: + +``` + . + "af321bbc" . + . + "Bank" . +``` + +
+ +### 3. Pseudonymize any value for a given predicate + +
+ Click to show + +Given the following config: + +```yaml +replace_value_of_predicate: + - "http://schema.org/name" +``` + +The goal is to pseudonymize any values associated to name. The following input +file: + +``` + . + "Alice" . + . + "Bank" . +``` + +Would become: + +``` + . + "af321bbc" . + . + "38a3dd71" . +``` + +