Skip to content

Commit

Permalink
doc: Added changelog v0.2.0b1 and v0.2.0b2. Revised some parts of doc.
Browse files Browse the repository at this point in the history
  • Loading branch information
wwoytenko committed Aug 30, 2024
1 parent 93731bf commit 1c8b959
Show file tree
Hide file tree
Showing 5 changed files with 257 additions and 8 deletions.
17 changes: 10 additions & 7 deletions docs/commands/restore.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,10 +110,11 @@ greenmask --config=config.yml restore DUMP_ID --inserts --overriding-system-valu
### Restoration in topological order

By default, Greenmask restores tables in the order they are listed in the dump file. To restore tables in topological
order, use the `--restore-in-order` flag. This is particularly useful when your schema includes foreign key references
and
you need to insert data in the correct order. Without this flag, you may encounter errors when inserting data into
tables with foreign key constraints.
order, use the `--restore-in-order` flag. This flag ensures that dependent tables are not restored until the tables they
depend on have been restored.

This is useful when you have the schema already created with foreign keys and other constraints, and you want to insert
data into the tables in the correct order or catch-up the target database with the new data.

!!! warning

Expand Down Expand Up @@ -143,9 +144,11 @@ greenmask --config=config.yml restore latest --pgzip
The COPY command returns the error only on transaction commit. This means that if you have a large dump and an error
occurs, you will have to wait until the end of the transaction to see the error message. To avoid this, you can use the
`--batch-size` flag to specify the number of rows to insert in a single batch during the COPY command. If an error
occurs
during the batch insertion, the error message will be displayed immediately. The data will be committed **only if all
batches are inserted successfully**.
occurs during the batch insertion, the error message will be displayed immediately. The data will be committed **only
if all batches are inserted successfully**.

This is useful when you want to be notified of errors as immediately as possible without waiting for the entire
table to be restored.

!!! warning

Expand Down
2 changes: 1 addition & 1 deletion docs/overrides/main.html
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{% extends "base.html" %}

{% block announce %}
A new major beta version 0.2.0b1 is <a href="https://github.com/GreenmaskIO/greenmask/releases/tag/v0.2.0b1">released</a>
A new major beta version 0.2.0b2 (2024.08.30) is <a href="https://github.com/GreenmaskIO/greenmask/releases/tag/v0.2.0b2">released</a>
{% endblock %}

{% block outdated %}
Expand Down
119 changes: 119 additions & 0 deletions docs/release_notes/greenmask_0_2_0_b1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Greenmask 0.2.0b1 (pre-release)

This **major beta** release introduces new features and refactored transformers, significantly enhancing Greenmask's
flexibility to better meet business needs.

## Changes overview

* [Introduced dynamic parameters in the transformers](../built_in_transformers/dynamic_parameters.md)
* Most transformers now support dynamic parameters where applicable.
* Dynamic parameters are strictly enforced. If you need to cast values to another type, Greenmask provides templates
and predefined cast functions accessible via `cast_to`. These functions cover frequent operations such as
`UnixTimestampToDate` and `IntToBool`.
* The transformation logic has been significantly refactored, making transformers more customizable and flexible than
before.
* [Introduced transformation engines](../built_in_transformers/transformation_engines.md)
* `random` - generates transformer values based on pseudo-random algorithms.
* `hash` - generates transformer values using hash functions. Currently, it utilizes `sha3` hash functions, which
are secure but perform slowly. In the stable release, there will be an option to choose between `sha3` and
`SipHash`.

* [Introduced static parameters value template](../built_in_transformers/parameters_templating.md)

## Notable changes

### Core

* Introduced the `Parametrizer` interface, now implemented for both dynamic and static parameters.
* Renamed most of the toolkit types for enhanced clarity and comprehensive documentation coverage.
* Refactored the `Driver` initialization logic.
* Added validation warnings for overridden types in the `Driver`.
* Migrated existing built-in transformers to utilize the new `Parametrizer` interface.
* Implemented a new abstraction, `TransformationContext`, as the first step towards enabling new feature transformation
conditions (#34).
* Optimized most transformers for performance in both dynamic and static modes. While dynamic mode offers flexibility,
static mode ensures performance remains high. Using only the necessary transformation features helps keep
transformation time predictable.

### Documentation

Documentation has been significantly refactored. New information about features and updates to transformer descriptions
have been added.

### Transformers

* [RandomEmail](../built_in_transformers/standard_transformers/random_email.md) - Introduces a new transformer that
supports both random and deterministic engines. It allows for flexible email value generation; you can use column
values in the template and choose to keep the original domain or select any from the `domains` parameter.

* [NoiseDate](../built_in_transformers/standard_transformers/noise_date.md), [NoiseFloat](../built_in_transformers/standard_transformers/noise_float.md), [NoiseInt](../built_in_transformers/standard_transformers/noise_int.md) -
These transformers support both random and deterministic engines, offering dynamic mode parameters that control the
noise thresholds within the `min` and `max` range. Unlike previous implementations which used a single `ratio`
parameter, the new release features `min_ratio` and `max_ratio` parameters to define noise values more precisely.
Utilizing the `hash` engine in these transformers enhances security by complicating statistical analysis for
attackers, especially when the same salt is used consistently over long periods.

* [NoiseNumeric](../built_in_transformers/standard_transformers/noise_numeric.md) - A newly implemented transformer,
sharing features with `NoiseInt` and `NoiseFloat`, but specifically designed for numeric values (large integers or
floats). It provides a `decimal` parameter to handle values with fractions.

* [RandomChoice](../built_in_transformers/standard_transformers/random_choice.md) - Now supports the `hash` engine

* [RandomDate](../built_in_transformers/standard_transformers/random_date.md), [RandomFloat](../built_in_transformers/standard_transformers/random_float.md), [RandomInt](../built_in_transformers/standard_transformers/random_int.md) -
Now enhanced with hash engine support. Threshold parameters `min` and `max` have been updated to support dynamic mode,
allowing for more flexible configurations.

* [RandomNumeric](../built_in_transformers/standard_transformers/random_numeric.md) - A new transformer specifically
designed for numeric types (large integers or floats), sharing similar features with `RandomInt` and `RandomFloat`,
but tailored for handling huge numeric values.

* [RandomString](../built_in_transformers/standard_transformers/random_string.md) - Now supports hash engine mode

* [RandomUnixTimestamp](../built_in_transformers/standard_transformers/random_unix_timestamp.md) - This new transformer
generates Unix timestamps with selectable units (`second`, `millisecond`, `microsecond`, `nanosecond`). Similar in
function to `RandomDate`, it supports the hash engine and dynamic parameters for `min` and `max` thresholds, with the
ability to override these units using `min_unit` and `max_unit` parameters.

* [RandomUuid](../built_in_transformers/standard_transformers/random_uuid.md) - Added hash engine support

* [RandomPerson](../built_in_transformers/standard_transformers/random_person.md) - Implemented a new transformer that
replaces `RandomName`, `RandomLastName`, `RandomFirstName`, `RandomFirstNameMale`, `RandomFirstNameFemale`,
`RandomTitleMale`, and `RandomTitleFemale`. This new transformer offers enhanced customizability while providing
similar functionalities as the previous versions. It generates personal data such as `FirstName`, `LastName`, and
`Title`, based on the provided `gender` parameter, which now supports dynamic mode. Future minor versions will allow
for overriding the default names database.

* Added [tsModify](../built_in_transformers/advanced_transformers/custom_functions/core_functions.md#tsmodify) - a new
template function for time.Time objects modification

* Introduced a new [RandomIp](../built_in_transformers/standard_transformers/random_ip.md) transformer capable of
generating a random IP address based on the specified netmask.

* Added a new [RandomMac](../built_in_transformers/standard_transformers/random_mac.md) transformer for generating
random Mac addresses.

* Deleted transformers include `RandomMacAddress`, `RandomIPv4`, `RandomIPv6`, `RandomUnixTime`, `RandomTitleMale`,
`RandomTitleFemale`, `RandomFirstName`, `RandomFirstNameMale`, `RandomFirstNameFemale`, `RandomLastName`, and
`RandomName` due to the introduction of more flexible and unified options.

#### Full Changelog: [v0.1.14...v0.2.0b1](https://github.com/GreenmaskIO/greenmask/compare/v0.1.14...v0.2.0b1)

## Playground usage for beta version

If you want to run a Greenmask [playground](../playground.md) for the beta version v0.2.0b1 execute:

```
git checkout tags/v0.2.0b1 -b v0.2.0b1
docker-compose run greenmask-from-source
```

## Links

Feel free to reach out to us if you have any questions or need assistance:

* [Greenmask Roadmap](https://github.com/orgs/GreenmaskIO/projects/6)
* [Email](mailto:[email protected])
* [Twitter](https://twitter.com/GreenmaskIO)
* [Telegram](https://t.me/greenmask_community)
* [Discord](https://discord.gg/tAJegUKSTB)
* [DockerHub](https://hub.docker.com/r/greenmask/greenmask)
125 changes: 125 additions & 0 deletions docs/release_notes/greenmask_0_2_0_b2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Greenmask 0.2.0b2 (pre-release)

This **major beta** release introduces new features such as the database subset, pgzip support, restoration in
topological and many more. It also includes fixes and improvements.

## Preface

This release is a major milestone that significantly expands Greenmask's functionality, transforming it into a simple,
extensible, and reliable solution for database security, data anonymization, and everyday operations. Our goal is to
create a core system that can serve as a foundation for comprehensive dynamic staging environments and robust data
security.

## Notable changes

* [**Database Subset**](../database_subset.md) - a new feature that allows you to define a subset of the database,
allowing you to scale down the dump size ([#110](https://github.com/GreenmaskIO/greenmask/issues/110)). This is
robust for multipurpose and especially useful for testing and development environments. It supports:

* References with [NULL values](../database_subset.md/#references-with-null-values) - generate the LEFT JOIN query
for the FK reference with NULL values to include them in the subset.
* Supports [virtual references](../database_subset.md/#virtual-references) (virtual foreign keys) - create a logical
FK in Greenmask that will be used for subset dependencies graph. The virtual reference can be defined for a column
or an expression, allowing you to get the value from JSON and similar.
* Supports [circular references](../database_subset.md/#circular-reference) - Greenmask will automatically resolve
circular dependencies in the subset by generating a recursive query. The query is generated with integrity checks
of the subset ensuring that the data gathered from circular dependencies is consistent.
* Fully covered with documentation including [troubleshooting](../database_subset.md/#troubleshooting)
and [examples](../database_subset.md/#example-dump-a-subset-of-the-database).
* Supports FK and PK that have more than one column (or expression).
* **Multi-cycles resolution in one strong connected component (SCC)** is supported - Greenmask will generate a
recursive query for the SCC whether it is a single cycle or multiple cycles, making the subset system universal
for any database schema.

* **pgzip** support for faster [compression](../commands/dump.md/#pgzip-compression)
and [decompression](../commands/restore.md/#pgzip-decompression) — setting `--pgzip` can speed up the dump and
restoration processes through parallel compression. In some tests, it shows up to 5x faster dump and restore
operations.
* [**Restoration in topological order**](../commands/restore.md/#restoration-in-topological-order) - This flag ensures
that dependent tables are not restored until the tables they depend on have been restored. This is useful when you
want to be notified of errors as immediately as possible without waiting for the entire table to be restored.
* **[Insert format](../commands/restore.md/#inserts-and-error-handling)** restoration - For a flexible restoration
process, Greenmask now supports data restoration in the `INSERT` format. It generates the insert statements based on
`COPY` records from the dump. You do not need to re-dump your data to use this feature; it can be defined in the
`restore` command. The list of new features related to the `INSERT` format:

* Generate `INSERT` statements with the `**ON CONFLICT DO NOTHING**` clause if the flag `--on-conflict-do-nothing`
is set.
* **[Error exclusion list](http://127.0.0.1:8000/configuration/#restoration-error-exclusion)** in the config to skip
certain errors and continue inserting subsequent rows from the dump.
* Use cases - **incremental dump and restoration** for logical data. For example, if you have a database and you
want to insert data periodically from another source, this can be used together with the database subset and
transformations to catch up the target database.

* [Restore data batching](../commands/restore.md/#restore-data-batching) ([#173](https://github.com/GreenmaskIO/greenmask/pull/174)) -
By default, the COPY protocol returns the error only on transaction commit. To override this behavior, use the
`--batch-size` flag to specify the number of rows to insert in a single batch during the COPY command. This is useful
when you want to control the transaction size and commit.
* [Introduced](https://github.com/GreenmaskIO/greenmask/pull/162) `keep_null` parameter for `RandomPerson` transformer.

### Fixes and improvements

* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/140) `validate` command with the `--table` flag, which had the
wrong order of the table name representation `{{ table_name }}.{{ schema }}` instead of
`{{ schema }}.{{ table_name }}`.
* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/137/commits/d421d6df2b55019235c81bdd22e341aa2509400b#diff-7a8b28dfeb9522d6af581535cbf61f3d2a744a68d4558515644d746fc9d43a2bL114)
`Row.SetColumn` out of range validation.
* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/137/commits/d421d6df2b55019235c81bdd22e341aa2509400b#diff-ef03875763278adee04b936cae57bb51d57c4ec8e55816f73e98c0af479a2441L543)
`restoreWorker` panic caused when the worker received an error from pgx.
* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/157/commits/03d7d7af3c569d629f44b29114caa74c14a47826) error
handling in the `restore` command.
* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/157/commits/03d7d7af3c569d629f44b29114caa74c14a47826) restore
jobs now start a transaction for each table restoration and commit it after the table restoration is done.
* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/157/commits/03d7d7af3c569d629f44b29114caa74c14a47826)
`--exit-on-error` works incorrectly in the `restore` command. Now, the `--exit-on-error` flag works correctly with the
`data` section.
* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/159) transaction rollback in the `validate` command.
* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/143) typo in documentation.
* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/136) a CI/CD bug related to retrieving current tags.
* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/141) the Docker image tag for `latest` to exclude specific
keywords.
* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/161) a case where the hashing value was not set for each column
in the `RandomPerson` transformer.
* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/165) original email value parsing conditions.
* [Subset docs revision](https://github.com/GreenmaskIO/greenmask/pull/169/files).
* [Fixes](https://github.com/GreenmaskIO/greenmask/pull/171) a case where data entries were excluded by exclusion
parameters such as `--exclude-table`, `--table`, etc.
* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/172) zero bytes that were written in the buffer due to the wrong
buffer limit in the `Email` transformer.
* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/175) a case where the overridden type of column via
`columns_type_override` did not work.
* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/177) a case where an unknown option provided in the config was
just ignored instead of throwing an error.
* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/178) a case where `min` and `max` parameter values were ignored
in transformers `NoiseDate`, `NoiseNumeric`, `NoiseFloat`, `NoiseInt`, `RandomNumeric`, `RandomFloat`, and
`RandomInt`.
* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/180) TOC entry COPY restoration statement - added missing
newline and semicolon. Now backward pg_dump call `pg_restore 1724504511561 --file 1724504511561.sql` is backward
compatible and works as expected.
* [Fixed](https://github.com/GreenmaskIO/greenmask/pull/184) a case where dump/restore fails when masking tables with a
generated column.
* [Updated go version (v1.22) and dependencies](https://github.com/GreenmaskIO/greenmask/pull/188)
* [Revised installation section of doc](https://github.com/GreenmaskIO/greenmask/pull/187)
* A bunch of refactoring and code cleanup to make the codebase more maintainable and readable.

#### Full Changelog: [v0.2.0b1...v0.2.0b2](https://github.com/GreenmaskIO/greenmask/compare/v0.2.0b1...v0.2.0b2)

## Playground usage for beta version

If you want to run a Greenmask [playground](../playground.md) for the beta version v0.2.0b2 execute:

```bash
git checkout tags/v0.2.0b2 -b v0.2.0b2
docker-compose run greenmask-from-source
```

## Links

Feel free to reach out to us if you have any questions or need assistance:

* [Greenmask Roadmap](https://github.com/orgs/GreenmaskIO/projects/6)
* [Email](mailto:[email protected])
* [Twitter](https://twitter.com/GreenmaskIO)
* [Telegram](https://t.me/greenmask_community)
* [Discord](https://discord.gg/tAJegUKSTB)
* [DockerHub](https://hub.docker.com/r/greenmask/greenmask)
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,8 @@ nav:
- Core custom functions: built_in_transformers/advanced_transformers/custom_functions/core_functions.md
- Faker function: built_in_transformers/advanced_transformers/custom_functions/faker_function.md
- Release notes:
- Greenmask 0.2.0b2: release_notes/greenmask_0_2_0_b2.md
- Greenmask 0.2.0b1: release_notes/greenmask_0_2_0_b1.md
- Greenmask 0.1.14: release_notes/greenmask_0_1_14.md
- Greenmask 0.1.13: release_notes/greenmask_0_1_13.md
- Greenmask 0.1.12: release_notes/greenmask_0_1_12.md
Expand Down

0 comments on commit 1c8b959

Please sign in to comment.