Skip to content

Commit

Permalink
Merge pull request #109 from GreenmaskIO/docs/v0_2_review_1
Browse files Browse the repository at this point in the history
docs: First doc review for v0.2 beta
  • Loading branch information
wwoytenko authored May 17, 2024
2 parents a81609a + 87d5455 commit 3627fea
Show file tree
Hide file tree
Showing 41 changed files with 1,401 additions and 1,228 deletions.
13 changes: 6 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@ backward-compatible with existing PostgreSQL utilities.

# Features

* **Deterministic transformers** — deterministic approach to data transformation based on the hash
functions. This ensures that the same input data will always produce the same output data. Almost each transformer
supports either `random` or `hash` engine making it universal for any use case.
* **Dynamic parameters** — almost each transformer supports dynamic parameters, allowing to parametrize the
transformer dynamically from the table column value. This is helpful for resolving the functional dependencies
between columns and satisfying the constraints.
* **Cross-platform** - Can be easily built and executed on any platform, thanks to its Go-based architecture,
which eliminates platform dependencies.
* **Database type safe** - Ensures data integrity by validating data and utilizing the database driver for
Expand Down Expand Up @@ -52,9 +58,6 @@ solution for managing obfuscation procedures. We recognize the challenges of mai
throughout the software lifecycle. Greenmask is dedicated to providing valuable tools and features that ensure the
obfuscation process remains fresh, predictable, and transparent.

## [Getting started](./getting_started.md)


### General Information

It is evident that the most appropriate approach for executing logical backup dumping and restoration is by leveraging
Expand Down Expand Up @@ -98,10 +101,6 @@ Greenmask introduces the concept of **Storages**.
various cloud-based storage solutions.
* **directory** - This is the standard choice, representing the ordinary filesystem directory for local storage.

!!! note
If you have suggestions for additional storage options that would be valuable to implement, please feel free to
share your ideas. Greenmask aims to accommodate a wide range of storage preferences to suit diverse backup needs.

## Restoration Process

In the restoration process, Greenmask combines the capabilities of different tools:
Expand Down
172 changes: 63 additions & 109 deletions config.yml.example
Original file line number Diff line number Diff line change
Expand Up @@ -25,137 +25,91 @@ dump:
load-via-partition-root: true

transformation:
- schema: "bookings"
name: "flights"
query: "select * from bookings.flights limit 100"
columns_type_override:
post_code: "int4"
- schema: "public"
name: "account"
transformers:
- name: "RandomDate"
params:
min: "2023-01-01 00:00:00.0+03"
max: "2023-01-02 00:00:00.0+03"
column: "scheduled_departure"

- name: "NoiseDate"
- name: "RandomInt"
params:
ratio: "1 day"
column: "scheduled_arrival"
column: "id"
engine: hash
min: 1
max: 2147483647

- name: "RegexpReplace"
- name: "RandomChoice"
params:
column: "departure_airport"
regexp: "DME"
replace: "SVO"
column: "gender"
values:
- "M"
- "F"

- name: "RegexpReplace"
- name: "RandomPerson"
params:
column: "status"
regexp: "On Time"
replace: "Delayed"
columns:
- name: "first_name"
template: "{{ .FirstName }}"
- name: "last_name"
template: "{{ .LastName }}"
dynamic_params:
gender:
column: gender

- name: "Email"
params:
column: "email"
engine: "hash"
keep_original_domain: true
keep_null: false
local_part_template: "{{ first_name | lower }}.{{ last_name | lower }}"

- name: "RandomDate"
params:
column: "actual_departure"
min: "2023-01-03 01:00:00.0+03"
max: "2023-01-04 00:00:00.0+03"
column: "birth_date"
min: '{{ now | tsModify "-30 years" | .EncodeValue }}' # 1994
max: '{{ now | tsModify "-18 years" | .EncodeValue }}' # 2006

- name: "RandomDate"
params:
column: "actual_arrival"
min: "2023-01-04 01:00:00.0+03"
max: "2023-01-05 00:00:00.0+03"
column: "created_at"
max: "{{ now | .EncodeValue }}"
truncate: "day"
dynamic_params:
min:
column: "birth_date"
template: '{{ .GetValue | tsModify "18 years" | .EncodeValue }}'

- name: "RandomInt"
params:
column: "post_code"
min: "11"
max: "99"

- name: "Replace"
params:
column: "post_code"
value: "54321"
- schema: "public"
name: "orders"
transformers:

- name: "TwoDatesGen"
- name: "RandomInt"
params:
column_a: "scheduled_arrival"
column_b: "actual_arrival"
column: "account_id"
engine: hash
min: 1
max: 2147483647

- name: "TestTransformer"
- name: "NoiseNumeric"
params:
column: "actual_arrival"
column: "total_price"
decimal: 2
min_ratio: 0.1
max_ratio: 0.9

- name: "Cmd"
params:
executable: "cmd_test.sh"
driver:
name: "json"
params:
format: "bytes"
timeout: "60s"
validate_output: true
expected_exit_code: -1
skip_on_behaviour: "any"
columns:
- name: "actual_arrival"
skip_original_data: true
skip_on_null_input: true
- name: "scheduled_arrival"
skip_original_data: true
#
- name: "TestTransformer"
- name: "NoiseDate"
params:
column: "scheduled_arrival"
column: "created_at"
max_ratio: "6 day"
min_ratio: "1 day"
truncate: "day"

- schema: "bookings"
name: "measurement"
apply_for_inherited: True
transformers:
- name: "RandomDate"
params:
column: "logdate"
min: "2023-01-03"
max: "2023-01-30"

- name: "TemplateRecord"
params:
validate: false
columns:
- "scheduled_departure"
template: >
{{- $val := .GetValue "scheduled_departure" -}}
{{- if isNull $val -}}
{{ now | dateModify "24h" | .SetValue "scheduled_departure" }}
{{ else }}
{{ now | dateModify "48h" | .SetValue "scheduled_departure" }}
{{ end }}


- schema: "bookings"
name: "aircrafts_data"
transformers:
- name: "Json"
params:
column: "model"
operations:
- operation: "set"
path: "en"
value: "Boeing 777-300-2023"
- operation: "set"
path: "crewSize"
value: 10

- name: "NoiseInt"
params:
ratio: 0.9
column: "range"

- name: "NoiseFloat"
params:
ratio: 0.1
column: "test_float"
precision: 2
column: "paid_at"
max: '{{ now | .EncodeValue }}'
truncate: "day"
dynamic_params:
min:
column: "created_at"

restore:
pg_restore_options:
Expand Down
4 changes: 0 additions & 4 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,6 @@ Greenmask introduces the concept of storages.
* `s3` — this option supports any S3-like storage system, including AWS S3, which makes it versatile and adaptable to various cloud-based storage solutions.
* `directory` — this is the standard choice, representing the ordinary filesystem directory for local storage.

!!! note
If you have suggestions for additional storage options that would be valuable to implement, feel free to
share your ideas with us. Greenmask aims to accommodate a wide range of storage preferences to suit diverse backup needs.

## Restoration process

In the restoration process, Greenmask combines the capabilities of different tools:
Expand Down
Binary file added docs/assets/built_in_transformers/img.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/assets/getting_started/list-dumps.png
Binary file not shown.
Binary file not shown.
Binary file removed docs/assets/getting_started/validate-result.png
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,8 @@ Below you can find custom core functions which are divided into categories based
### masking

Replaces characters with asterisk `*` symbols depending on the provided masking rule. If the
value is `NULL`, it is kept unchanged. This function is based on [ggwhite/go-masker](https://github.com/ggwhite/go-masker).
value is `NULL`, it is kept unchanged. This function is based
on [ggwhite/go-masker](https://github.com/ggwhite/go-masker).

=== "Masking rules"

Expand Down Expand Up @@ -113,16 +114,18 @@ Adds or subtracts a random duration in the provided `interval` to or from the or

### noiseFloat

Adds or subtracts a random fraction to or from the original float value. Multiplies the original float value by a provided random value that is not higher than the `ratio` parameter and adds it to the original value with the option to specify the precision via the `precision` parameter.
Adds or subtracts a random fraction to or from the original float value. Multiplies the original float value by a
provided random value that is not higher than the `ratio` parameter and adds it to the original value with the option to
specify the decimal via the `decimal` parameter.

=== "Signature"

`noiseFloat(ratio float, precision int, value float) (res float64, err error)`
`noiseFloat(ratio float, decimal int, value float) (res float64, err error)`

=== "Parameters"

* `ratio` — the maximum multiplier value in the interval (0:1). The value will be randomly generated up to `ratio`, multiplied by the original value, and the result will be added to the original value.
* `precision` — the precision of the resulted value
* `decimal` — the decimal of the resulted value
* `value` — the original value

=== "Return values"
Expand All @@ -132,7 +135,8 @@ Adds or subtracts a random fraction to or from the original float value. Multipl

### noiseInt

Adds or subtracts a random fraction to or from the original integer value. Multiplies the original integer value by a provided random value that is not higher than the `ratio` parameter and adds it to the original value.
Adds or subtracts a random fraction to or from the original integer value. Multiplies the original integer value by a
provided random value that is not higher than the `ratio` parameter and adds it to the original value.

=== "Signature"

Expand Down Expand Up @@ -176,13 +180,13 @@ Generates a random float value within the provided interval.

=== "Signature"

`randomFloat(min any, max any, precision int) (res float, err error)`
`randomFloat(min any, max any, decimal int) (res float, err error)`

=== "Parameters"

* `min` — the minimum random value threshold
* `max` — the maximum random value threshold
* `precision` — the precision of the resulted value
* `decimal` — the decimal of the resulted value

=== "Return values"

Expand Down Expand Up @@ -229,18 +233,37 @@ Generates a random string using the provided characters within the specified len

### roundFloat

Rounds a float value up to provided precision.
Rounds a float value up to provided decimal.

=== "Signature"

`roundFloat(precision int, original float) (res float, err error)`
`roundFloat(decimal int, original float) (res float, err error)`

=== "Parameters"

* `precision` — the precision of the value
* `decimal` — the decimal of the value
* `original` — the original float value

=== "Return values"

* `res` — a rounded float value
* `err` — an error if there is an issue

### tsModify

Modify original time value by adding or subtracting the provided interval. The interval is a string in the format of
the [PostgreSQL interval](https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT).

=== "Signature"

`tsModify(interval string, val time.Time) (time.Time, error)`

=== "Parameters"

* `interval` — the maximum value of `ratio` that is added to the original value. The format is the same as in the [PostgreSQL interval format](https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT).
* `original` — the original time value

=== "Return values"

* `res` — a modified date
* `err` — an error if there is an issue
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Modify records using a Go template and apply changes by using the PostgreSQL dri

## Description

`TemplateRecord` uses [Go templates](https://pkg.go.dev/text/template) to change data. However, while the [Template transformer](/template.md) operates with a single column and automatically applies results, the `TemplateRecord` transformer can make changes to a set of columns in the string, and using driver functions `.SetValue` or `.SetRawValue` is mandatory to do that.
`TemplateRecord` uses [Go templates](https://pkg.go.dev/text/template) to change data. However, while the [Template transformer](./template.md) operates with a single column and automatically applies results, the `TemplateRecord` transformer can make changes to a set of columns in the string, and using driver functions `.SetValue` or `.SetRawValue` is mandatory to do that.

With the `TemplateRecord` transformer, you can implement complicated transformation logic using basic or custom template functions. Below you can get familiar with the basic template functions for the `TemplateRecord` transformer. For more information about available custom template functions, see [Custom functions](custom_functions/index.md).

Expand Down
Loading

0 comments on commit 3627fea

Please sign in to comment.