Skip to content

Commit

Permalink
docs: revised README.md (#225)
Browse files Browse the repository at this point in the history
  • Loading branch information
wwoytenko authored Nov 3, 2024
1 parent 5e17392 commit bfee93d
Showing 1 changed file with 36 additions and 56 deletions.
92 changes: 36 additions & 56 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,43 +40,34 @@ sample databases included to help you try Greenmask without any additional actio

## Features

* **[Deterministic transformers](https://docs.greenmask.io/latest/built_in_transformers/transformation_engines/#hash-engine)**
— deterministic approach to data transformation based on the hash
functions. This ensures that the same input data will always produce the same output data. Almost each transformer
supports either `random` or `hash` engine making it universal for any use case.
* **[Dynamic parameters](https://docs.greenmask.io/latest/built_in_transformers/dynamic_parameters/)** — almost each
transformer supports dynamic parameters, allowing to parametrize the
transformer dynamically from the table column value. This is helpful for resolving the functional dependencies
between columns and satisfying the constraints.
* **[Transformation validation and easy maintainable](https://docs.greenmask.io/latest/commands/validate/)** - During
configuration process, Greenmask provides validation
warnings, data transformation diff and schema diff features, allowing you to monitor and maintain transformations
effectively
throughout the software lifecycle. Schema diff helps to avoid data leakage when schema changed.
* **[Partitioned tables transformation inheritance](https://docs.greenmask.io/latest/configuration/?h=partition#dump-section)**
— Define transformation configurations once and apply them to all
partitions within partitioned tables (using `apply_for_inherited` parameter), simplifying the anonymization process.
* **Stateless** - Greenmask operates as a logical dump and does not impact your existing database schema.
* **Cross-platform** - Can be easily built and executed on any platform, thanks to its Go-based architecture,
* **[Deterministic transformers](https://docs.greenmask.io/latest/built_in_transformers/transformation_engines/#hash-engine)** — Uses hash functions to ensure consistent output for the same input. Most transformers support both `random` and
`hash` engines, offering flexibility for various use cases.
* **[Dynamic parameters](https://docs.greenmask.io/latest/built_in_transformers/dynamic_parameters/)** — most
transformers support dynamic parameters, allowing them to adapt based on table column values. This feature helps
manage dependencies between columns and meet constraints effectively.
* **[Transformation Condition](https://docs.greenmask.io/latest/built_in_transformers/transformation_condition/)**
applies the transformation only when a specified condition is met, making it useful for targeting specific rows.
* **[Transformation validation and easy maintenance](https://docs.greenmask.io/latest/commands/validate/)** — Greenmask
provides validation warnings, data transformation diffs, and schema diffs during configuration, enabling effective
monitoring and maintenance of transformations. The schema diff feature helps prevent data leakage when the schema
changes.
* **[Transformation inheritance](https://docs.greenmask.io/latest/built_in_transformers/transformation_inheritance/)**
— transformation inheritance for partitioned tables and tables with foreign keys. Define once and apply to all.
* **Stateless** — Greenmask operates as a logical dump and does not impact your existing database schema.
* **Cross-platform** — Can be easily built and executed on any platform, thanks to its Go-based architecture,
which eliminates platform dependencies.
* **Database type safe** - Ensures data integrity by validating data and utilizing the database driver for
encoding and decoding operations. This approach guarantees the preservation of data formats.
* **Backward compatible** - It fully supports the same features and protocols as existing vanilla PostgreSQL utilities.
Dumps created by Greenmask can be successfully restored using the pg_restore utility.
* **Extensible** - Users have the flexibility
* **Database type safe** Ensures data integrity by validating data and using the database driver for encoding and
decoding operations, preserving accurate data formats.
* **Backward compatible** — Fully supports the same features and protocols as standard PostgreSQL utilities. Dumps
created by Greenmask can be seamlessly restored using the `pg_restore` utility.
* **Extensible** Users have the flexibility
to [implement domain-based transformations](https://docs.greenmask.io/latest/built_in_transformers/standard_transformers/cmd/)
in any programming language or
use [predefined templates](https://docs.greenmask.io/latest/built_in_transformers/advanced_transformers/).
* **Integrable** - Integrate seamlessly into your CI/CD system for automated database anonymization and
restoration.
* **Parallel execution** - Take advantage of parallel dumping and restoration, significantly reducing the time required
to deliver results.
* **Provide variety of storages** - offers a variety of storage options for local and remote data storage,
including directories and S3-like storage solutions.
* **[Pgzip support for faster compression](https://docs.greenmask.io/latest/commands/dump/?h=pgzip#pgzip-compression)** — by
setting `--pgzip`, it can speeds up the dump and restoration
processes through parallel compression.

* **Parallel execution** — Enables parallel dumping and restoration to significantly speed up results.
* **Variety of storages** — Supports both local and remote storage, including directories and S3-compatible solutions.
* **[Pgzip support for faster compression](https://docs.greenmask.io/latest/commands/dump/?h=pgzip#pgzip-compression)** — Speeds up dump and restoration processes with parallel compression
by setting `--pgzip`.

## Use Cases

Expand All @@ -92,24 +83,20 @@ Greenmask is ideal for various scenarios, including:

### General Information

It is evident that the most appropriate approach for executing logical backup dumping and restoration is by leveraging
the core PostgreSQL utilities, specifically pg_dump and pg_restore. **Greenmask** has been purposefully designed to
align with PostgreSQL's native utilities, ensuring compatibility. Greenmask primarily handles data dumping
operations independently and delegates the responsibilities of schema dumping and restoration to pg_dump and pg_restore,
maintaining seamless integration with PostgreSQL's standard tools.

#### Backup and Process
The best approach for logical backup dumping and restoration is to use core PostgreSQL utilities, specifically pg_dump
and pg_restore. Greenmask is designed to align with these native tools, ensuring full compatibility. It independently
manages data dumping while delegating schema dumping and restoration to `pg_dump` and `pg_restore`, ensuring smooth
integration with PostgreSQL’s standard workflow.

Greenmask uses the **directory format** of _pg_dump_ and _pg_restore_. This format is particularly suitable for
parallel execution and partial restoration, and it includes clear metadata files that aid in determining the backup and
restoration steps. Greenmask has been optimized to work seamlessly with remote storage systems and anonymization
procedures.
Greenmask utilizes the directory format of `pg_dump` and `pg_restore`, ideal for parallel execution and partial restoration.
This format includes metadata files to guide backup and restoration steps.

#### Storage Options

* **s3** - This option supports any S3-like storage system, including AWS S3, making it versatile and adaptable to
various cloud-based storage solutions.
* **directory** - This is the standard choice, representing the ordinary filesystem directory for local storage.
* **[s3](https://docs.greenmask.io/latest/configuration/#__tabbed_1_2)** - Supports any S3-compatible storage system,
including AWS S3, offering flexibility across different cloud storage solutions.
* **[directory](https://docs.greenmask.io/latest/configuration/#__tabbed_1_1)** - This is the default option,
representing a standard filesystem directory for local storage.

#### Data Anonymization and Validation

Expand All @@ -125,17 +112,11 @@ If your table schema relies on functional dependencies between columns, you can
parameters, you can resolve such as created_at and updated_at cases, where the
updated_at must be greater or equal than the created_at.

If you need to implement custom logic imperatively use
If you need to implement custom logic imperatively
use [Cmd](https://docs.greenmask.io/latest/built_in_transformers/standard_transformers/cmd/) or
[TemplateRecord](https://docs.greenmask.io/latest/built_in_transformers/advanced_transformers/template_record/) or
[Template](https://docs.greenmask.io/latest/built_in_transformers/advanced_transformers/template/) transformers.

Greenmask provides a framework for creating your custom transformers, which can be reused efficiently. These
transformers can be seamlessly integrated without requiring recompilation, thanks to the PIPE (stdin/stdout)
interaction.

Furthermore, Greenmask's architecture is designed to be highly extensible, making it possible to introduce other
interaction protocols, such as HTTP or Socket, for conducting anonymization procedures.
#### PostgreSQL Version Compatibility

**Greenmask** is compatible with PostgreSQL versions **11 and higher**.
Expand All @@ -149,7 +130,6 @@ interaction protocols, such as HTTP or Socket, for conducting anonymization proc
* [Discord](https://discord.com/invite/rKBKvDECfd)
* [DockerHub](https://hub.docker.com/r/greenmask/greenmask)

## References

* Utilized the [Demo database](https://postgrespro.com/community/demodb), provided by PostgresPro, for integration
Expand Down

0 comments on commit bfee93d

Please sign in to comment.