Skip to content

Commit

Permalink
feat: apply for references
Browse files Browse the repository at this point in the history
* Added documentation wit examples
* Implemented initial config builder tests
* Revised transformers implementation structure - moved string literal to const
* Configured allowed transformers to `apply_for_references`
  • Loading branch information
wwoytenko committed Nov 3, 2024
1 parent 29f466f commit b2f7d3e
Show file tree
Hide file tree
Showing 38 changed files with 532 additions and 102 deletions.
171 changes: 171 additions & 0 deletions docs/built_in_transformers/apply_for_references.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
# Apply for References

## Description

Using `apply_for_references`, you can apply transformations to columns involved in a primary key or in tables with a
foreign key that references that column. This simplifies the transformation process by requiring you to define the
transformation only on the primary key column, which will then be applied to all tables referencing that column.

The transformer must support `hash` engine and the `hash` engin must be set in the configuration file.

## End-to-End Identifiers

End-to-end identifiers in databases are unique identifiers that are consistently used across multiple tables in a
relational database schema, allowing for a seamless chain of references from one table to another. These identifiers
typically serve as primary keys in one table and are propagated as foreign keys in other tables, creating a direct,
traceable link from one end of a data relationship to the other.

Greenmask can detect end-to-end identifiers and apply transformations across the entire sequence of tables. These
identifiers are detected when the following condition is met: the foreign key serves as both a primary key and a foreign
key in the referenced table.

## Limitations

- The transformation must be deterministic.
- The transformation condition will not be applied to the referenced column.
- Not all transformers support `apply_for_references`

!!! warning

We do not recommend using `apply_for_references` with transformation conditions, as these conditions are not
inherited by transformers on the referenced columns. This may lead to inconsistencies in the data.

List of transformers that supports `apply_for_references`:

* Hash
* NoiseDate
* NoiseFloat
* NoiseInt
* NoiseNumeric
* RandomBool
* RandomDate
* RandomEmail
* RandomFloat
* RandomInt
* RandomIp
* RandomMac
* RandomNumeric
* RandomString
* RandomUuid
* RandomUnixTimestamp

## Example 1. Simple table references

This is ordinary table references where the primary key of the `users` table is referenced in the `orders` table.

```sql
-- Enable the extension for UUID generation (if not enabled)
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

CREATE TABLE users
(
user_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
username VARCHAR(50) NOT NULL
);

CREATE TABLE orders
(
order_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
user_id UUID REFERENCES users (user_id),
order_date DATE NOT NULL
);

INSERT INTO users (username)
VALUES ('john_doe');
INSERT INTO users (username)
VALUES ('jane_smith');

INSERT INTO orders (user_id, order_date)
VALUES ((SELECT user_id FROM users WHERE username = 'john_doe'), '2024-10-31'),
((SELECT user_id FROM users WHERE username = 'jane_smith'), '2024-10-30');
```

To transform the `username` column in the `users` table, you can use the following configuration:

```yaml
- schema: public
name: users
apply_for_inherited: true
transformers:
- name: RandomUuid
apply_for_references: true
params:
column: "user_id"
engine: "hash"
```
This will apply the `RandomUuid` transformation to the `user_id` column in the `orders` table automatically.

## Example 2. Tables with end-to-end identifiers

In this example, we have three tables: `tablea`, `tableb`, and `tablec`. All tables have a composite primary key.
In the tables `tableb` and `tablec`, the primary key is also a foreign key that references the primary key of `tablea`.
This means that all PKs are end-to-end identifiers.

```sql
CREATE TABLE tablea
(
id1 INT,
id2 INT,
data VARCHAR(50),
PRIMARY KEY (id1, id2)
);
CREATE TABLE tableb
(
id1 INT,
id2 INT,
detail VARCHAR(50),
PRIMARY KEY (id1, id2),
FOREIGN KEY (id1, id2) REFERENCES tablea (id1, id2) ON DELETE CASCADE
);
CREATE TABLE tablec
(
id1 INT,
id2 INT,
description VARCHAR(50),
PRIMARY KEY (id1, id2),
FOREIGN KEY (id1, id2) REFERENCES tableb (id1, id2) ON DELETE CASCADE
);
INSERT INTO tablea (id1, id2, data)
VALUES (1, 1, 'Data A1'),
(2, 1, 'Data A2'),
(3, 1, 'Data A3');
INSERT INTO tableb (id1, id2, detail)
VALUES (1, 1, 'Detail B1'),
(2, 1, 'Detail B2'),
(3, 1, 'Detail B3');
INSERT INTO tablec (id1, id2, description)
VALUES (1, 1, 'Description C1'),
(2, 1, 'Description C2'),
(3, 1, 'Description C3');
```

To transform the `data` column in `tablea`, you can use the following configuration:

```yaml
- schema: public
name: "tablea"
apply_for_inherited: true
transformers:
- name: RandomInt
apply_for_references: true
params:
min: 0
max: 100
column: "id1"
engine: "hash"
- name: RandomInt
apply_for_references: true
params:
min: 0
max: 100
column: "id2"
engine: "hash"
```

This will apply the `RandomInt` transformation to the `id1` and `id2` columns in `tableb` and `tablec` automatically.
9 changes: 7 additions & 2 deletions docs/built_in_transformers/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,15 @@
Transformers in Greenmask are methods which are applied to anonymize sensitive data. All Greenmask transformers are
split into the following groups:

- [Transformation engines](transformation_engines.md) — the type of generator used in transformers. Hash (deterministic)
and random (randomization)
- [Dynamic parameters](dynamic_parameters.md) — transformers that require an input of parameters and generate
random data based on them.
- [Transformation engines](transformation_engines.md) — the type of generator used in transformers. Hash (deterministic)
and random (randomization)
- [Parameters templating](parameters_templating.md) — generate static parameters values from templates.
- [Transformation conditions](transformation_condition.md) — conditions that can be applied to transformers. If the
condition is not met, the transformer will not be applied.
- [Apply for references](apply_for_references.md) — apply transformation on a column that is involved in a primary key
and tables with a foreign key that references that column.
- [Standard transformers](standard_transformers/index.md) — transformers that require only an input of parameters.
- [Advanced transformers](advanced_transformers/index.md) — transformers that can be modified according to user's needs
with the help of [custom functions](advanced_transformers/custom_functions/index.md).
Expand Down
2 changes: 1 addition & 1 deletion docs/built_in_transformers/transformation_condition.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ The condition must be defined as a boolean expression that evaluates to `true` o
`expr` library.

You can use the same functions that are described in
the [built-in transformers](/docs/built_in_transformers/advanced_transformers/custom_functions/index.md)
the [built-in transformers](./advanced_transformers/custom_functions/index.md)

The transformers are executed one by one - this helps you create complex transformation pipelines. For instance
depending on value chosen in the previous transformer, you can decide to execute the next transformer or not.
Expand Down
95 changes: 79 additions & 16 deletions internal/db/postgres/context/config_builder.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import (

"github.com/greenmaskio/greenmask/internal/db/postgres/entries"
"github.com/greenmaskio/greenmask/internal/db/postgres/subset"
"github.com/greenmaskio/greenmask/internal/db/postgres/transformers"
transformersUtils "github.com/greenmaskio/greenmask/internal/db/postgres/transformers/utils"
"github.com/greenmaskio/greenmask/internal/domains"
"github.com/greenmaskio/greenmask/pkg/toolkit"
Expand Down Expand Up @@ -51,7 +52,7 @@ func (tcm *tableConfigMapping) hasTransformerWithApplyForReferences() bool {
// may contain the schema affection warnings that would be useful for considering consistency
func validateAndBuildEntriesConfig(
ctx context.Context, tx pgx.Tx, entries []*entries.Table, typeMap *pgtype.Map,
cfg *domains.Dump, registry *transformersUtils.TransformerRegistry,
cfg *domains.Dump, r *transformersUtils.TransformerRegistry,
version int, types []*toolkit.Type, graph *subset.Graph,
) (toolkit.ValidationWarnings, error) {
var warnings toolkit.ValidationWarnings
Expand All @@ -66,14 +67,11 @@ func validateAndBuildEntriesConfig(
}

// Assign settings to the Tables using config received
//entriesWithTransformers := findTablesWithTransformers(cfg.Transformation, Tables)
entriesWithTransformers, err := setConfigToEntries(ctx, tx, cfg.Transformation, entries, graph)
entriesWithTransformers, setConfigWarns, err := setConfigToEntries(ctx, tx, cfg.Transformation, entries, graph, r)
if err != nil {
return nil, fmt.Errorf("cannot get Tables entries config: %w", err)
}
// TODO:
// Check if any has relkind = p
// If yes, then find all children and remove them from entriesWithTransformers
warnings = append(warnings, setConfigWarns...)
for _, cfgMapping := range entriesWithTransformers {
// set subset conditions
setSubsetConds(cfgMapping.entry, cfgMapping.config)
Expand Down Expand Up @@ -122,7 +120,7 @@ func validateAndBuildEntriesConfig(
setColumnTypeOverrides(cfgMapping.entry, cfgMapping.config, typeMap)

// Set transformers for the table
transformersInitWarns, err := initAndSetupTransformers(ctx, cfgMapping.entry, cfgMapping.config, registry)
transformersInitWarns, err := initAndSetupTransformers(ctx, cfgMapping.entry, cfgMapping.config, r)
enrichWarningsWithTableName(transformersInitWarns, cfgMapping.entry)
warnings = append(warnings, transformersInitWarns...)
if err != nil {
Expand Down Expand Up @@ -176,9 +174,8 @@ func validateTableExists(
return warnings, nil
}

// findTablesWithTransformers - assigns settings from the config to the table entries. This function
// iterates through the Tables and do the following:
// 1. Compile when condition and set to the table entry
// findTablesWithTransformers - finds Tables with transformers in the config and returns them as a slice of
// tableConfigMapping
func findTablesWithTransformers(
cfg []*domains.Table, tables []*entries.Table,
) []*tableConfigMapping {
Expand All @@ -200,12 +197,19 @@ func findTablesWithTransformers(

func setConfigToEntries(
ctx context.Context, tx pgx.Tx, cfg []*domains.Table, tables []*entries.Table, g *subset.Graph,
) ([]*tableConfigMapping, error) {
r *transformersUtils.TransformerRegistry,
) ([]*tableConfigMapping, toolkit.ValidationWarnings, error) {
var res []*tableConfigMapping
var warnings toolkit.ValidationWarnings
for _, tcm := range findTablesWithTransformers(cfg, tables) {
if tcm.hasTransformerWithApplyForReferences() {
// If table has transformer with apply_for_references, then we need to find all reference tables
// and add them to the list
ok, checkWarns := checkApplyForReferenceMetRequirements(tcm, r)
if !ok {
warnings = append(warnings, checkWarns...)
continue
}
refTables := getRefTables(tcm.entry, tcm.config, g)
res = append(res, refTables...)
}
Expand All @@ -216,15 +220,18 @@ func setConfigToEntries(
}
// If the table is partitioned, then we need to find all children and remove parent from the list
if !tcm.config.ApplyForInherited {
return nil, fmt.Errorf(
"the table \"%s\".\"%s\" is partitioned use apply_for_inherited",
tcm.entry.Schema, tcm.entry.Name,
warnings = append(warnings, toolkit.NewValidationWarning().
SetMsg("the table is partitioned use apply_for_inherited").
AddMeta("SchemaName", tcm.entry.Schema).
AddMeta("TableName", tcm.entry.Name).
SetSeverity(toolkit.ErrorValidationSeverity),
)
continue
}

parts, err := findPartitionsOfPartitionedTable(ctx, tx, tcm.entry)
if err != nil {
return nil, fmt.Errorf(
return nil, nil, fmt.Errorf(
"cannot find partitions of the table %s.%s: %w",
tcm.entry.Schema, tcm.entry.Name, err,
)
Expand All @@ -248,7 +255,7 @@ func setConfigToEntries(
})
}
}
return res, nil
return res, warnings, nil
}

func getRefTables(rootTable *entries.Table, rootTableCfg *domains.Table, graph *subset.Graph) []*tableConfigMapping {
Expand Down Expand Up @@ -532,3 +539,59 @@ func initAndSetupTransformers(ctx context.Context, t *entries.Table, cfg *domain
}
return warnings, nil
}

func checkApplyForReferenceMetRequirements(
tcm *tableConfigMapping, r *transformersUtils.TransformerRegistry,
) (bool, toolkit.ValidationWarnings) {
warnings := toolkit.ValidationWarnings{}
for _, tr := range tcm.config.Transformers {
allowed, w := isTransformerAllowedToApplyForReferences(tr, r)
if !allowed {
warnings = append(warnings, w...)
}
}
return !warnings.IsFatal(), warnings
}

// isTransformerAllowedToApplyForReferences - checks if the transformer is allowed to apply for references
// and if the engine parameter is hash and required
func isTransformerAllowedToApplyForReferences(
cfg *domains.TransformerConfig, r *transformersUtils.TransformerRegistry,
) (bool, toolkit.ValidationWarnings) {
td, ok := r.Get(cfg.Name)
if !ok {
return false, toolkit.ValidationWarnings{
toolkit.NewValidationWarning().
SetMsg("transformer not found").
AddMeta("TransformerName", cfg.Name).
SetSeverity(toolkit.ErrorValidationSeverity),
}
}
allowApplyForReferenced, ok := td.Properties.GetMeta(transformers.AllowApplyForReferenced)
if !ok || !allowApplyForReferenced.(bool) {
return false, toolkit.ValidationWarnings{
toolkit.NewValidationWarning().
SetMsg(
"cannot apply transformer for references: transformer does not support apply for references",
).
AddMeta("TransformerName", cfg.Name).
SetSeverity(toolkit.ErrorValidationSeverity),
}
}
requireHashEngineParameter, ok := td.Properties.GetMeta(transformers.RequireHashEngineParameter)
if !ok {
return false, nil
}
if !requireHashEngineParameter.(bool) {
return true, nil
}
if string(cfg.Params[engineParameterName]) != transformers.HashEngineParameterName {
return false, toolkit.ValidationWarnings{
toolkit.NewValidationWarning().
SetMsg("cannot apply transformer for references: engine parameter is not hash").
AddMeta("TransformerName", cfg.Name).
SetSeverity(toolkit.ErrorValidationSeverity),
}
}
return true, nil
}
Loading

0 comments on commit b2f7d3e

Please sign in to comment.