diff --git a/dev/404.html b/dev/404.html
index b0df2f06..0ae275fd 100644
--- a/dev/404.html
+++ b/dev/404.html
@@ -87,7 +87,7 @@
- A new version 0.2.5 (2024.11.16) is released
+ A new version 0.2.6 (2024.12.07) is released
@@ -247,7 +247,7 @@
Introduces --disable-trigers, --use-session-replication-role-replica and --superuser options
+for restore command. It allows to disable triggers during data section restore #248.
+Closes feature request #228
+
Fix skipping unknown type when silent is true #251
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/dev/search/search_index.json b/dev/search/search_index.json
index a9439856..f591437d 100644
--- a/dev/search/search_index.json
+++ b/dev/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"About Greenmask","text":""},{"location":"#dump-anonymization-and-synthetic-data-generation-tool","title":"Dump anonymization and synthetic data generation tool","text":"
Greenmask is a powerful open-source utility that is designed for logical database backup dumping, anonymization, synthetic data generation and restoration. It has ported PostgreSQL libraries, making it reliable. It is stateless and does not require any changes to your database schema. It is designed to be highly customizable and backward-compatible with existing PostgreSQL utilities, fast and reliable.
Deterministic transformers \u2014 deterministic approach to data transformation based on the hash functions. This ensures that the same input data will always produce the same output data. Almost each transformer supports either random or hash engine making it universal for any use case.
Dynamic parameters \u2014 almost each transformer supports dynamic parameters, allowing to parametrize the transformer dynamically from the table column value. This is helpful for resolving the functional dependencies between columns and satisfying the constraints.
Transformation validation and easy maintainable - During configuration process, Greenmask provides validation warnings, data transformation diff and schema diff features, allowing you to monitor and maintain transformations effectively throughout the software lifecycle. Schema diff helps to avoid data leakage when schema changed.
Partitioned tables transformation inheritance \u2014 Define transformation configurations once and apply them to all partitions within partitioned tables (using apply_for_inherited parameter), simplifying the anonymization process.
Stateless - Greenmask operates as a logical dump and does not impact your existing database schema.
Cross-platform - Can be easily built and executed on any platform, thanks to its Go-based architecture, which eliminates platform dependencies.
Database type safe - Ensures data integrity by validating data and utilizing the database driver for encoding and decoding operations. This approach guarantees the preservation of data formats.
Backward compatible - It fully supports the same features and protocols as existing vanilla PostgreSQL utilities. Dumps created by Greenmask can be successfully restored using the pg_restore utility.
Extensible - Users have the flexibility to implement domain-based transformations in any programming language or use predefined templates.
Integrable - Integrate seamlessly into your CI/CD system for automated database anonymization and restoration.
Parallel execution - Take advantage of parallel dumping and restoration, significantly reducing the time required to deliver results.
Provide variety of storages - offers a variety of storage options for local and remote data storage, including directories and S3-like storage solutions.
Pgzip support for faster compression \u2014 by setting --pgzip, it can speeds up the dump and restoration processes through parallel compression.
Greenmask is ideal for various scenarios, including:
Backup and restoration. Use Greenmask for your daily routines involving logical backup dumping and restoration. It seamlessly handles tasks like table restoration after truncation. Its functionality closely mirrors that of pg_dump and pg_restore, making it a straightforward replacement.
Anonymization, transformation, and data masking. Employ Greenmask for anonymizing, transforming, and masking backups, especially when setting up a staging environment or for analytical purposes. It simplifies the deployment of a pre-production environment with consistently anonymized data, facilitating faster time-to-market in the development lifecycle.
It is evident that the most appropriate approach for executing logical backup dumping and restoration is by leveraging the core PostgreSQL utilities, specifically pg_dump and pg_restore. Greenmask has been purposefully designed to align with PostgreSQL's native utilities, ensuring compatibility. Greenmask primarily handles data dumping operations independently and delegates the responsibilities of schema dumping and restoration to pg_dump and pg_restore respectively, maintaining seamless integration with PostgreSQL's standard tools.
The process of backing up PostgreSQL databases is divided into three distinct sections:
Pre-data \u2014 this section encompasses the raw schema of tables, excluding primary keys (PK) and foreign keys (FK).
Data \u2014 the data section contains the actual table data in COPY format, including information about sequence current values and Large Objects data.
Post-data \u2014 in this section, you'll find the definitions of indexes, triggers, rules, and constraints (such as PK and FK).
Greenmask focuses exclusively on the data section during runtime. It delegates the handling of the pre-data and post-data sections to the core PostgreSQL utilities, pg_dump and pg_restore.
Greenmask employs the directory format of pg_dump and pg_restore. This format is particularly suitable for parallel execution and partial restoration, and it includes clear metadata files that aid in determining the backup and restoration steps. Greenmask has been optimized to work seamlessly with remote storage systems and anonymization procedures.
When performing data dumping, Greenmask utilizes the COPY command in TEXT format, maintaining reliability and compatibility with the vanilla PostgreSQL utilities.
Additionally, Greenmask supports parallel execution, significantly reducing the time required for the dumping process.
The core PostgreSQL utilities, pg_dump and pg_restore, traditionally operate with files in a directory format, offering no alternative methods. To meet modern backup requirements and provide flexible approaches, Greenmask introduces the concept of storages.
s3 \u2014 this option supports any S3-like storage system, including AWS S3, which makes it versatile and adaptable to various cloud-based storage solutions.
directory \u2014 this is the standard choice, representing the ordinary filesystem directory for local storage.
In the restoration process, Greenmask combines the capabilities of different tools:
For schema restoration Greenmask utilizes pg_restore to restore the database schema. This ensures that the schema is accurately reconstructed.
For data restoration Greenmask independently applies the data using the COPY protocol. This allows Greenmask to handle the data efficiently, especially when working with various storage solutions. Greenmask is aware of the restoration metadata, which enables it to download only the necessary data. This feature is particularly useful for partial restoration scenarios, such as restoring a single table from a complete backup.
Greenmask also supports parallel restoration, which can significantly reduce the time required to complete the restoration process. This parallel execution enhances the efficiency of restoring large datasets.
"},{"location":"architecture/#data-anonymization-and-validation","title":"Data anonymization and validation","text":"
Greenmask works with COPY lines, collects schema metadata using the Golang driver, and employs this driver in the encoding and decoding process. The validate command offers a way to assess the impact on both schema (validation warnings) and data (transformation and displaying differences). This command allows you to validate the schema and data transformations, ensuring the desired outcomes during the anonymization process.
If your table schema relies on functional dependencies between columns, you can address this challenge using the TemplateRecord transformer. This transformer enables you to define transformation logic for entire tables, offering type-safe operations when assigning new values.
Greenmask provides a framework for creating your custom transformers, which can be reused efficiently. These transformers can be seamlessly integrated without requiring recompilation, thanks to the PIPE (stdin/stdout) interaction.
Note
Furthermore, Greenmask's architecture is designed to be highly extensible, making it possible to introduce other interaction protocols, such as HTTP or Socket, for conducting anonymization procedures.
"},{"location":"architecture/#postgresql-version-compatibility","title":"PostgreSQL version compatibility","text":"
Greenmask is compatible with PostgreSQL versions 11 and higher.
common \u2014 settings that can be used for both the dump and restore commands
log \u2014 settings for the logging subsystem
storage \u2014 settings for the storage locations where dumps are stored
dump \u2014 settings for the dump command. This section includes pg_dump options and transformation parameters.
restore \u2014 settings for the restore command. It contains pg_restore options and additional restoration scripts.
custom_transformers \u2014 definitions of the custom transformers that interact through stdin and stdout. Once a custom transformer is configured, it becomes accessible via the greenmask list-transformers command.
In the common section of the configuration, you can specify the following settings:
pg_bin_path \u2014 path to the PostgreSQL binaries. Note that the PostgreSQL server version must match the provided binaries.
tmp_dir \u2014 temporary directory for storing the table of contents files. Default value is /tmp
Note
Greenmask exclusively manages data dumping and data restoration processes, delegating schema dumping to the pg_dumputility and schema restoration to the pg_restore utility. Both pg_dump and pg_restore rely on a toc.dat file located in a specific directory, which contains metadata and object definitions. Therefore, the tmp_dir parameter is essential for storing the toc.dat file during the dumping or restoration procedure. It is important to note that all artifacts in this directory will be automatically deleted once the Greenmask command is completed.
In the storage section, you can configure the storage driver for storing the dumped data. Currently, two storage type options are supported: directory and s3.
directory options3 option
The directory storage option refers to a filesystem directory where the dump data will be stored.
Parameters include path which specifies the path to the directory in the filesystem where the dumps will be stored.
By choosing the s3 storage option, you can store dump data in an S3-like remote storage service, such as Amazon S3 or Azure Blob Storage. Here are the parameters you can configure for S3 storage:
endpoint \u2014 overrides the default AWS endpoint to a custom one for making requests
bucket \u2014 the name of the bucket where the dump data will be stored
prefix \u2014 a prefix for objects in the bucket, specified in path format
region \u2014 the S3 service region
storage_class \u2014 the storage class for performing object requests
access_key_id \u2014 access key for authentication
secret_access_key \u2014 secret access key for authentication
session_token \u2014 session token for authentication
role_arn \u2014 Amazon resource name for role-based authentication
session_name \u2014 role session name to uniquely identify a session
max_retries \u2014 the number of retries on request failures
cert_file \u2014 the path to the SSL certificate for making requests
max_part_size \u2014 the maximum part length for one request
concurrency \u2014 the number of goroutines to use in parallel for each upload call when sending parts
use_list_objects_v1 \u2014 use the old v1 ListObjects request instead of v2 one
force_path_style \u2014 force the request to use path-style addressing (e. g., http://s3.amazonaws.com/BUCKET/KEY) instead of virtual hosted bucket addressing (e. g., http://BUCKET.s3.amazonaws.com/KEY)
In the dump section of the configuration, you configure the greenmask dump command. It includes the following parameters:
pg_dump_options \u2014 a map of pg_dump options to configure the behavior of the command itself. You can refer to the list of supported pg_dump options in the Greenmask dump command documentation.
transformation \u2014 this section contains configuration for applying transformations to table columns during the dump operation. It includes the following sub-parameters:
schema \u2014 the schema name of the table
name \u2014 the name of the table
subset_conds - list of the conditions to filter the rows to be dumped. The conditions are combined with AND operator. For details read Database subset
query \u2014 an optional parameter for specifying a custom query to be used in the COPY command. By default, the entire table is dumped, but you can use this parameter to set a custom query.
Warning
Be cautious when using the query parameter, as it may lead to constraint violation errors during restoration, and Greenmask currently cannot handle query validation.
columns_type_override \u2014 allows you to override the column types explicitly. You can associate a column with another type that is supported by your transformer. This is useful when the transformer works strictly with specific types of columns. For example, if a column named post_code is of the TEXT type, but the RandomInt transformer works only with INT family types, you can override it as shown in the example provided. column type overridden example
Change the data type of the post_code column to INT4 (INTEGER)
apply_for_inherited \u2014 an optional parameter to apply the same transformation to all partitions if the table is partitioned. This can save you from defining the transformation for each partition manually.
Warning
It is recommended to use the --load-via-partition-root parameter when dealing with partitioned tables, as the partition key value might change.
transformers \u2014 a list of transformers to apply to the table, along with their parameters. Each transformation item includes the following sub-parameters:
name \u2014 the name of the transformer
params \u2014 a map of the provided transformer parameters
Override the post_code column type to int4 (INTEGER). This is necessary because the post_code column originally has a TEXT type, but it contains values that resemble integers. By explicitly overriding the type to int4, we ensure compatibility with transformers that work with integer types, such as RandomInt.
After the type is overridden, we can apply a compatible transformer.
Database subset condition applied to the aircrafts_data table. The subset condition filters the data based on the model column.
In the validate section of the configuration, you can specify parameters for the greenmask validate command. Here is an example of the validate section configuration:
A list of tables to validate. If this list is not empty, the validation operation will only be performed for the specified tables. Tables can be written with or without the schema name (e. g., \"public.cart\" or \"orders\").
Specifies whether to perform data transformation for a limited set of rows. If set to true, data transformation will be performed, and the number of rows transformed will be limited to the value specified in the rows_limit parameter (default is 10).
Specifies whether to perform diff operations for the transformed data. If set to true, the validation process will find the differences between the original and transformed data. See more details in the validate command documentation.
Limits the number of rows to be transformed during validation. The default limit is 10 rows, but you can change it by modifying this parameter.
A hash list of resolved warnings. These warnings have been addressed and resolved in a previous validation run.
Specifies the format of the transformation output. Possible values are [horizontal|vertical]. The default format is horizontal. You can choose the format that suits your needs. See more details in the validate command documentation.
The output format (json or text)
Specifies whether to validate the schema current schema with the previous and print the differences if any.
If set to true, transformation output will be only with the transformed columns and primary keys
In the restore section of the configuration, you can specify parameters for the greenmask restore command. It contains pg_restore settings and custom script execution settings. Below you can find the available parameters:
pg_restore_options \u2014 a map of pg_restore options that are used to configure the behavior of the pg_restore utility during the restoration process. You can refer to the list of supported pg_restore options in the Greenmask restore command documentation.
scripts \u2014 a map of custom scripts to be executed during different restoration stages. Each script is associated with a specific restoration stage and includes the following attributes:
[pre-data|data|post-data] \u2014 the name of the restoration stage when the script should be executed; has the following parameters:
name \u2014 the name of the script
when \u2014 specifies when to execute the script, which can be either \"before\" or \"after\" the specified restoration stage
query \u2014 an SQL query string to be executed
query_file \u2014 the path to an SQL query file to be executed
command \u2014 a command with parameters to be executed. It is provided as a list, where the first item is the command name.
insert_error_exclusions \u2014 a list of error codes that should be ignored during the restoration process. This is useful when you want to skip specific errors that are not critical for the restoration process.
As mentioned in the architecture, a backup contains three sections: pre-data, data, and post-data. The custom script execution allows you to customize and control the restoration process by executing scripts or commands at specific stages. The available restoration stages and their corresponding execution conditions are as follows:
pre-data \u2014 scripts or commands can be executed before or after restoring the pre-data section
data \u2014 scripts or commands can be executed before or after restoring the data section
post-data \u2014 scripts or commands can be executed before or after restoring the post-data section
Each stage can have a \"when\" condition with one of the following possible values:
before \u2014 execute the script or SQL command before the mentioned restoration stage
after \u2014 execute the script or SQL command after the mentioned restoration stage
Below you can find one of the possible versions for the scripts part of the restore section:
scripts definition example
scripts:\n pre-data: # (1)\n - name: \"pre-data before script [1] with query\"\n when: \"before\"\n query: \"create table script_test(stage text)\"\n - name: \"pre-data before script [2]\"\n when: \"before\"\n query: \"insert into script_test values('pre-data before')\"\n - name: \"pre-data after test script [1]\"\n when: \"after\"\n query: \"insert into script_test values('pre-data after')\"\n - name: \"pre-data after script with query_file [1]\"\n when: \"after\"\n query_file: \"pre-data-after.sql\"\n data: # (2)\n - name: \"data before script with command [1]\"\n when: \"before\"\n command: # (4)\n - \"data-after.sh\"\n - \"param1\"\n - \"param2\"\n - name: \"data after script [1]\"\n when: \"after\"\n query_file: \"data-after.sql\"\n post-data: # (3)\n - name: \"post-data before script [1]\"\n when: \"before\"\n query: \"insert into script_test values('post-data before')\"\n - name: \"post-data after script with query_file [1]\"\n when: \"after\"\n query_file: \"post-data-after.sql\"\n
List of pre-data stage scripts. This section contains scripts that are executed before or after the restoration of the pre-data section. The scripts include SQL queries and query files.
List of data stage scripts. This section contains scripts that are executed before or after the restoration of the data section. The scripts include shell commands with parameters and SQL query files.
List of post-data stage scripts. This section contains scripts that are executed before or after the restoration of the post-data section. The scripts include SQL queries and query files.
Command in the first argument and the parameters in the rest of the list. When specifying a command to be executed in the scripts section, you provide the command name as the first item in a list, followed by any parameters or arguments for that command. The command and its parameters are provided as a list within the script configuration.
You can configure which errors to ignore during the restoration process by setting the insert_error_exclusions parameter. This parameter can be applied globally or per table. If both global and table-specific settings are defined, the table-specific settings will take precedence. Below is an example of how to configure the insert_error_exclusions parameter. You can specify constraint names from your database schema or the error codes returned by PostgreSQL. codes in the PostgreSQL documentation.
It's also possible to configure Greenmask through environment variables.
Greenmask will automatically parse any environment variable that matches the configuration in the config file by substituting the dot (.) separator for an underscore (_) and uppercasing it. As an example, the config file below would apply the same configuration as defining the LOG_LEVEL=debug environment variable
Additionaly, there are some environment variables exposed by the dump and restore commands to facilitate the connection configuration with a Postgres database
PGHOST - host used to connect to the postgres database
PGPORT - port where postgres is exposed
PGDATABASE - name of the database to dump/restore
PGUSER - username used to connect to the postgres database
PGPASSWORD - password used to authenticate to the postgres database
Greenmask allows you to define a subset condition for filtering data during the dump process. This feature is useful when you need to dump only a part of the database, such as a specific table or a set of tables. It automatically ensures data consistency by including all related data from other tables that are required to maintain the integrity of the subset. The subset condition can be defined using subset_conds attribute that can be defined on the table in the transformation section (see examples).
Info
Greenmask genrates queries for subset conditions based on the introspected schema using joins and recursive queries. It cannot be responsible for query optimization. The subset quries might be slow due to the complexity of the queries and/or lack of indexes. Circular are resolved using recursive queries.
The subset is a list of SQL conditions that are applied to table. The conditions are combined with AND operator. You need to specify the schema, table and column name when pointing out the column to filter by to avoid ambiguity. The subset condition must be a valid SQL condition.
Subset condition example
subset_conds:\n - 'person.businessentity.businessentityid IN (274, 290, 721, 852)'\n
Database scale down - create anonymized dump but for the limited and consistent set of tables
Data migration - migrate only some records from one database to another
Data anonymization - dump and anonymize only a specific records in the database
Database catchup - catchup your another instance of database logically by adding a new records. In this case it is recommended to restore tables in topological order using --restore-in-order.
"},{"location":"database_subset/#references-with-null-values","title":"References with NULL values","text":"
For references that do not have NOT NULL constraints, Greenmask will automatically generate LEFT JOIN queries with the appropriate conditions to ensure integrity checks. You can rely on Greenmask to handle such cases correctly\u2014no special configuration is needed, as it performs this automatically based on the introspected schema.
Greenmask supports circular references between tables. You can define a subset condition for any table, and Greenmask will automatically generate the appropriate queries for the table subset using recursive queries. The subset system ensures data consistency by validating all records found through the recursive queries. If a record does not meet the subset condition, it will be excluded along with its parent records, preventing constraint violations.
Warning
Currently (v0.2b2), Greenmask can resolve multi-cylces in one strogly connected component, but only for one group of vertexes. If you have SSC that contains 2 groups of vertexes, Greenmask will not be able to resolve it. For instance we have 2 cycles with tables A, B, C (first group) and B, C, E (second group). Greenmask will not be able to resolve it. But if you have only one group of vertexes one and more cycles in the same group of tables (for instance A, B, C), Greenmask works with it. This will be fixed in the future. See second example below. In practice this is quite rare situation and 99% of people will not face this issue.
You can read the Wikipedia article about Circular reference here.
During the development process, there are situations where foreign keys need to be removed. The reasons can vary\u2014from improving performance to simplifying the database structure. Additionally, some foreign keys may exist within loosely structured data, such as JSON, where PostgreSQL cannot create foreign keys at all. These limitations could significantly hinder the capabilities of a subset system. Greenmask offers a flexible solution to this problem by allowing the declaration of virtual references in the configuration, enabling the preservation and management of logical relationships between tables, even in the absence of explicit foreign keys. Virtual reference can be called virtual foreign key as well.
The virtual_references can be defined in dump section. It contains the list of virtual references. First you set the table where you want to define virtual reference. In the attribute references define the list of tables that are referenced by the table. In the columns attribute define the list of columns that are used in the foreign key reference. The not_null attribute is optional and defines if the FK has not null constraint. If true Greenmask will generate INNER JOIN instead of LEFT JOIN by default it is false. The expression needs to be used when you want to use some expression to get the value of the column in the referencing table. For instance, if you have JSONB column in the audit_logs table that contains order_id field, you can use this field as FK reference.
Info
You do not need to define primry key of the referenced table. Greenmask will automatically resolve it and use it in the join condition.
Greenmask supports polymorphic references. You can define a virtual reference for a table with polymorphic references using polymorphic_exprs attribute. The polymorphic_exprs attribute is a list of expressions that are used to make a polymorphic reference. For instance we might have a table comments that has polymorphic reference to posts and videos. The table comments might have commentable_id and commentable_type columns. The commentable_type column contains the type of the table that is referenced by the commentable_id column. The example of the config:
The plimorphic references cannot be non_null because the commentable_id column can be NULL if the commentable_type is not set or different that the values defined in the polymorphic_exprs attribute.
"},{"location":"database_subset/#troubleshooting","title":"Troubleshooting","text":""},{"location":"database_subset/#exclude-the-records-that-has-null-values-in-the-referenced-column","title":"Exclude the records that has NULL values in the referenced column","text":"
If you want to exclude records that have NULL values in the referenced column, you can manually add this condition to the subset condition for the table. Greenmask does not automatically exclude records with NULL values because it applies a LEFT OUTER JOIN on nullable foreign keys.
"},{"location":"database_subset/#some-table-is-not-filtered-by-the-subset-condition","title":"Some table is not filtered by the subset condition","text":"
Greenmask builds a table dependency graph based on the introspected schema and existing foreign keys. If a table is not filtered by the subset condition, it means that the table either does not reference another table that is filtered by the subset condition or the table itself does not have a subset condition applied.
If you have a table with a removed foreign key and want to filter it by the subset condition, you need to define a virtual reference. For more information on virtual references, refer to the Virtual References section.
Info
If you find any issues related to the code or greenmask is not working as expected, do not hesitate to contact us directly or by creating an issue in the repository.
"},{"location":"database_subset/#error-column-reference-id-is-ambiguous","title":"ERROR: column reference \"id\" is ambiguous","text":"
If you see the error message ERROR: column reference \"{column name}\" is ambiguous, you have specified the column name without the table and/or schema name. To avoid ambiguity, always specify the schema and table name when pointing out the column to filter by. For instance if you want to filter employees by employee_id column, you should use public.employees.employee_id instead of employee_id.
Valid subset condition
public.employees.employee_id IN (1, 2, 3)\n
"},{"location":"database_subset/#the-subset-condition-is-not-working-correctly-how-can-i-verify-it","title":"The subset condition is not working correctly. How can I verify it?","text":"
Run greenmask with --log-level=debug to see the generated SQL queries. You will find the generated SQL queries in the log output. Validate this query in your database client to ensure that the subset condition is working as expected.
For example:
$ greenmask dump --config config.yaml --log-level=debug\n\n2024-08-29T19:06:18+03:00 DBG internal/db/postgres/context/context.go:202 > Debug query Schema=person Table=businessentitycontact pid=1638339\n2024-08-29T19:06:18+03:00 DBG internal/db/postgres/context/context.go:203 > SELECT \"person\".\"businessentitycontact\".* FROM \"person\".\"businessentitycontact\" INNER JOIN \"person\".\"businessentity\" ON \"person\".\"businessentitycontact\".\"businessentityid\" = \"person\".\"businessentity\".\"businessentityid\" AND ( person.businessentity.businessentityid between 400 and 800 OR person.businessentity.businessentityid between 800 and 900 ) INNER JOIN \"person\".\"person\" ON \"person\".\"businessentitycontact\".\"personid\" = \"person\".\"person\".\"businessentityid\" WHERE TRUE AND ((\"person\".\"person\".\"businessentityid\") IN (SELECT \"person\".\"businessentity\".\"businessentityid\" FROM \"person\".\"businessentity\" WHERE ( ( person.businessentity.businessentityid between 400 and 800 OR person.businessentity.businessentityid between 800 and 900 ) )))\n pid=1638339\n
"},{"location":"database_subset/#dump-is-too-slow","title":"Dump is too slow","text":"
If the dump process is too slow the generated query might be too complex. In this case you can:
Check if the database has indexes on the columns used in the subset condition. Create them if possible.
Move database dumping on the replica to avoid the performance impact on the primary.
"},{"location":"database_subset/#example-dump-a-subset-of-the-database","title":"Example: Dump a subset of the database","text":"
Info
All examples based on playground database. Read more about the playground database in the Playground section.
The following example demonstrates how to dump a subset of the person schema. The subset condition is applied to the businessentity and password tables. The subset condition filters the data based on the businessentityid and passwordsalt columns, respectively.
"},{"location":"database_subset/#example-dump-a-subset-with-circular-reference","title":"Example: Dump a subset with circular reference","text":"Create tables with multi cyles
-- Step 1: Create tables without foreign keys\nDROP TABLE IF EXISTS employees CASCADE;\nCREATE TABLE employees\n(\n employee_id SERIAL PRIMARY KEY,\n name VARCHAR(100) NOT NULL,\n department_id INT -- Will reference departments(department_id)\n);\n\nDROP TABLE IF EXISTS departments CASCADE;\nCREATE TABLE departments\n(\n department_id SERIAL PRIMARY KEY,\n name VARCHAR(100) NOT NULL,\n project_id INT -- Will reference projects(project_id)\n);\n\nDROP TABLE IF EXISTS projects CASCADE;\nCREATE TABLE projects\n(\n project_id SERIAL PRIMARY KEY,\n name VARCHAR(100) NOT NULL,\n lead_employee_id INT, -- Will reference employees(employee_id)\n head_employee_id INT -- Will reference employees(employee_id)\n);\n\n-- Step 2: Alter tables to add foreign key constraints\nALTER TABLE employees\n ADD CONSTRAINT fk_department\n FOREIGN KEY (department_id) REFERENCES departments (department_id);\n\nALTER TABLE departments\n ADD CONSTRAINT fk_project\n FOREIGN KEY (project_id) REFERENCES projects (project_id);\n\nALTER TABLE projects\n ADD CONSTRAINT fk_lead_employee\n FOREIGN KEY (lead_employee_id) REFERENCES employees (employee_id);\n\nALTER TABLE projects\n ADD CONSTRAINT fk_lead_employee2\n FOREIGN KEY (head_employee_id) REFERENCES employees (employee_id);\n\n-- Insert projects\nINSERT INTO projects (name, lead_employee_id)\nSELECT 'Project ' || i, NULL\nFROM generate_series(1, 10) AS s(i);\n\n-- Insert departments\nINSERT INTO departments (name, project_id)\nSELECT 'Department ' || i, i\nFROM generate_series(1, 10) AS s(i);\n\n-- Insert employees and assign 10 of them as project leads\nINSERT INTO employees (name, department_id)\nSELECT 'Employee ' || i, (i / 10) + 1\nFROM generate_series(1, 99) AS s(i);\n\n-- Assign 10 employees as project leads\nUPDATE projects\nSET lead_employee_id = (SELECT employee_id\n FROM employees\n WHERE employees.department_id = projects.project_id\n LIMIT 1),\n head_employee_id = 3\nWHERE project_id <= 10;\n
But this will return empty result, because the subset condition is not met for all related tables because project with project_id=1 has reference to employee with employee_id=3 that is invalid for subset condition.
"},{"location":"database_subset/#example-dump-a-subset-with-virtual-references","title":"Example: Dump a subset with virtual references","text":"
In this example, we will create a subset of the tables with virtual references. The subset will include the orders table and its related tables customers and audit_logs. The orders table has a virtual reference to the customers table, and the audit_logs table has a virtual reference to the orders table.
Create tables with virtual references
-- Create customers table\nCREATE TABLE customers\n(\n customer_id SERIAL PRIMARY KEY,\n customer_name VARCHAR(100)\n);\n\n-- Create orders table\nCREATE TABLE orders\n(\n order_id SERIAL PRIMARY KEY,\n customer_id INT, -- This should reference customers.customer_id, but no FK constraint is defined\n order_date DATE\n);\n\n-- Create payments table\nCREATE TABLE payments\n(\n payment_id SERIAL PRIMARY KEY,\n order_id INT, -- This should reference orders.order_id, but no FK constraint is defined\n payment_amount DECIMAL(10, 2),\n payment_date DATE\n);\n\n-- Insert test data into customers table\nINSERT INTO customers (customer_name)\nVALUES ('John Doe'),\n ('Jane Smith'),\n ('Alice Johnson');\n\n-- Insert test data into orders table\nINSERT INTO orders (customer_id, order_date)\nVALUES (1, '2023-08-01'), -- Related to customer John Doe\n (2, '2023-08-05'), -- Related to customer Jane Smith\n (3, '2023-08-07');\n-- Related to customer Alice Johnson\n\n-- Insert test data into payments table\nINSERT INTO payments (order_id, payment_amount, payment_date)\nVALUES (1, 100.00, '2023-08-02'), -- Related to order 1 (John Doe's order)\n (2, 200.50, '2023-08-06'), -- Related to order 2 (Jane Smith's order)\n (3, 300.75, '2023-08-08');\n-- Related to order 3 (Alice Johnson's order)\n\n\n-- Create a table with a multi-key reference (composite key reference)\nCREATE TABLE order_items\n(\n order_id INT, -- Should logically reference orders.order_id\n item_id INT, -- Composite part of the key\n product_name VARCHAR(100),\n quantity INT,\n PRIMARY KEY (order_id, item_id) -- Composite primary key\n);\n\n-- Create a table with a JSONB column that contains a reference value\nCREATE TABLE audit_logs\n(\n log_id SERIAL PRIMARY KEY,\n log_data JSONB -- This JSONB field will contain references to other tables\n);\n\n-- Insert data into order_items table with multi-key reference\nINSERT INTO order_items (order_id, item_id, product_name, quantity)\nVALUES (1, 1, 'Product A', 3), -- Related to order_id = 1 from orders table\n (1, 2, 'Product B', 5), -- Related to order_id = 1 from orders table\n (2, 1, 'Product C', 2), -- Related to order_id = 2 from orders table\n (3, 1, 'Product D', 1);\n-- Related to order_id = 3 from orders table\n\n-- Insert data into audit_logs table with JSONB reference value\nINSERT INTO audit_logs (log_data)\nVALUES ('{\n \"event\": \"order_created\",\n \"order_id\": 1,\n \"details\": {\n \"customer_name\": \"John Doe\",\n \"total\": 100.00\n }\n}'),\n ('{\n \"event\": \"payment_received\",\n \"order_id\": 2,\n \"details\": {\n \"payment_amount\": 200.50,\n \"payment_date\": \"2023-08-06\"\n }\n }'),\n ('{\n \"event\": \"item_added\",\n \"order_id\": 1,\n \"item\": {\n \"item_id\": 2,\n \"product_name\": \"Product B\",\n \"quantity\": 5\n }\n }');\n
The following example demonstrates how to make a subset for keys that does not have FK constraints but a data relationship exists.
The orders table has a virtual reference to the customers table, and the audit_logs table has a virtual reference to the orders table.
The payments table has a virtual reference to the orders table.
The order_items table has two keys that reference the orders and products tables.
The audit_logs table has a JSONB column that contains two references to the orders and order_items tables.
As a result, the customers table will be dumped with the orders table and its related tables payments, order_items, and audit_logs. The subset condition will be applied to the customers table, and the data will be filtered based on the customer_id column.
"},{"location":"database_subset/#example-dump-a-subset-with-polymorphic-references","title":"Example: Dump a subset with polymorphic references","text":"
In this example, we will create a subset of the tables with polymorphic references. This example includes the comments table and its related tables posts and videos.
Create tables with polymorphic references and insert data
-- Create the Posts table\nCREATE TABLE posts\n(\n id SERIAL PRIMARY KEY,\n title VARCHAR(255) NOT NULL,\n content TEXT NOT NULL\n);\n\n-- Create the Videos table\nCREATE TABLE videos\n(\n id SERIAL PRIMARY KEY,\n title VARCHAR(255) NOT NULL,\n url VARCHAR(255) NOT NULL\n);\n\n-- Create the Comments table with a polymorphic reference\nCREATE TABLE comments\n(\n id SERIAL PRIMARY KEY,\n commentable_id INT NOT NULL, -- Will refer to either posts.id or videos.id\n commentable_type VARCHAR(50) NOT NULL, -- Will store the type of the associated record\n body TEXT NOT NULL,\n created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP\n);\n\n\n-- Insert data into the Posts table\nINSERT INTO posts (title, content)\nVALUES ('First Post', 'This is the content of the first post.'),\n ('Second Post', 'This is the content of the second post.');\n\n-- Insert data into the Videos table\nINSERT INTO videos (title, url)\nVALUES ('First Video', 'https://example.com/video1'),\n ('Second Video', 'https://example.com/video2');\n\n-- Insert data into the Comments table, associating some comments with posts and others with videos\n-- For posts:\nINSERT INTO comments (commentable_id, commentable_type, body)\nVALUES (1, 'post', 'This is a comment on the first post.'),\n (2, 'post', 'This is a comment on the second post.');\n\n-- For videos:\nINSERT INTO comments (commentable_id, commentable_type, body)\nVALUES (1, 'video', 'This is a comment on the first video.'),\n (2, 'video', 'This is a comment on the second video.');\n
The comments table has a polymorphic reference to the posts and videos tables. Depending on the value of the commentable_type column, the commentable_id column will reference either the posts.id or videos.id column.
The following example demonstrates how to make a subset for tables with polymorphic references.
This example selects only the first post from the posts table and its related comments from the comments table. The comments are associated with videos are included without filtering because the subset condition is applied only to the posts table and related comments.
The resulted records will be:
transformed=# select * from comments;\n id | commentable_id | commentable_type | body | created_at \n----+----------------+------------------+---------------------------------------+----------------------------\n 1 | 1 | post | This is a comment on the first post. | 2024-09-18 05:27:54.217405\n 2 | 2 | post | This is a comment on the second post. | 2024-09-18 05:27:54.217405\n 3 | 1 | video | This is a comment on the first video. | 2024-09-18 05:27:54.229794\n(3 rows)\n
Once the repository is cloned, execute the following command to build Greenmask:
make build\n
After completing the build process, you will find the binary named greenmask in the root directory of the repository. Execute the binary to start using Greenmask.
Greenmask Playground is a sandbox environment for your experiments in Docker with sample databases included to help you try Greenmask without any additional actions. Read the Playground guide to learn more.
Greenmask Playground is a sandbox environment in Docker with sample databases included to help you try Greenmask without any additional actions. It includes the following components:
Original database \u2014 the source database you'll be working with.
Empty database for restoration \u2014 an empty database where the restored data will be placed.
MinIO storage \u2014 used for storage purposes.
Greenmask Utility \u2014 Greenmask itself, ready for use.
Warning
To complete this guide, you must have Docker and docker-compose installed.
"},{"location":"playground/#setting-up-greenmask-playground","title":"Setting up Greenmask Playground","text":"
Clone the greenmask repository and navigate to its directory by running the following commands:
git clone git@github.com:GreenmaskIO/greenmask.git && cd greenmask\n
Once you have cloned the repository, start the environment by running Docker Compose:
docker-compose run greenmask\n
Tip
If you're experiencing problems with pulling images from Docker Hub, you can build the Greenmask image from source by running the following command:
docker-compose run greenmask-from-source\n
Now you have Greenmask Playground up and running with a shell prompt inside the container. All further operations will be carried out within this container's shell.
A configuration file is mandatory for Greenmask functioning. The pre-defined configuration file is stored at the repository root directory (./playground/config.yml). It also serves to define transformers which you can update to your liking in order to use Greenmask Playground more effectively and to get better understanding of the tool itself. To learn how to customize a configuration file, see Configuration
The pre-defined configuration file uses the NoiseDate transformer as an example. To learn more about other transformers and how to use them, see Transformers.
Most transformers in Greenmask have dynamic parameters. This functionality is possible because Greenmask utilizes a database driver that can encode and decode raw values into their actual type representations.
This allows you to retrieve parameter values directly from the records. This capability is particularly beneficial when you need to resolve functional dependencies between fields or satisfy constraints. Greenmask processes transformations sequentially. Therefore, when you reference a field that was transformed in a previous step, you will access the transformed value.
column - Specifies the column name. The value from each record in this column will be passed to the transformer as a parameter.
cast_to - Indicates the function used to cast the column value to the desired type. Before being passed to the transformer, the value is cast to this type. For more details, see Cast functions.
template - Defines the template used for casting the column value to the desired type. You can create your own template and incorporate predefined functions and operators to implement the casting logic or other logic required for passing the value to the transformer. For more details, see Template functions.
default_value - Determines the default value used if the column's value is NULL. This value is represented in raw format appropriate to the type specified in the column parameter.
"},{"location":"built_in_transformers/dynamic_parameters/#cast-functions","title":"Cast functions","text":"name description input type output type UnixNanoToDate Cast int value as Unix Timestamp in Nano Seconds to date type int2, int4, int8, numeric, float4, float8 date UnixMicroToDate Cast int value as Unix Timestamp in Micro Seconds to date type int2, int4, int8, numeric, float4, float8 date UnixMilliToDate Cast int value as Unix Timestamp in Milli Seconds to date type int2, int4, int8, numeric, float4, float8 date UnixSecToDate Cast int value as Unix Timestamp in Seconds to date type int2, int4, int8, numeric, float4, float8 date UnixNanoToTimestamp Cast int value as Unix Timestamp in Nano Seconds to timestamp type int2, int4, int8, numeric, float4, float8 timestamp UnixMicroToTimestamp Cast int value as Unix Timestamp in Micro Seconds to timestamp type int2, int4, int8, numeric, float4, float8 timestamp UnixMilliToTimestamp Cast int value as Unix Timestamp in Milli Seconds to timestamp type int2, int4, int8, numeric, float4, float8 timestamp UnixSecToTimestamp Cast int value as Unix Timestamp in Seconds to timestamp type int2, int4, int8, numeric, float4, float8 timestamp UnixNanoToTimestampTz Cast int value as Unix Timestamp in Nano Seconds to timestamptz type int2, int4, int8, numeric, float4, float8 timestamptz UnixMicroToTimestampTz Cast int value as Unix Timestamp in Micro Seconds to timestamptz type int2, int4, int8, numeric, float4, float8 timestamptz UnixMilliToTimestampTz Cast int value as Unix Timestamp in Milli Seconds to timestamptz type int2, int4, int8, numeric, float4, float8 timestamptz UnixSecToTimestampTz Cast int value as Unix Timestamp in Seconds to timestamptz type int2, int4, int8, numeric, float4, float8 timestamptz DateToUnixNano Cast date value to int value as a Unix Timestamp in Nano Seconds date int2, int4, int8, numeric, float4, float8 DateToUnixMicro Cast date value to int value as a Unix Timestamp in Micro Seconds date int2, int4, int8, numeric, float4, float8 DateToUnixMilli Cast date value to int value as a Unix Timestamp in Milli Seconds date int2, int4, int8, numeric, float4, float8 DateToUnixSec Cast date value to int value as a Unix Timestamp in Seconds date int2, int4, int8, numeric, float4, float8 TimestampToUnixNano Cast timestamp value to int value as a Unix Timestamp in Nano Seconds timestamp int2, int4, int8, numeric, float4, float8 TimestampToUnixMicro Cast timestamp value to int value as a Unix Timestamp in Micro Seconds timestamp int2, int4, int8, numeric, float4, float8 TimestampToUnixMilli Cast timestamp value to int value as a Unix Timestamp in Milli Seconds timestamp int2, int4, int8, numeric, float4, float8 TimestampToUnixSec Cast timestamp value to int value as a Unix Timestamp in Seconds timestamp int2, int4, int8, numeric, float4, float8 TimestampTzToUnixNano Cast timestamptz value to int value as a Unix Timestamp in Nano Seconds timestamptz int2, int4, int8, numeric, float4, float8 TimestampTzToUnixMicro Cast timestamptz value to int value as a Unix Timestamp in Micro Seconds timestamptz int2, int4, int8, numeric, float4, float8 TimestampTzToUnixMilli Cast timestamptz value to int value as a Unix Timestamp in Milli Seconds timestamptz int2, int4, int8, numeric, float4, float8 TimestampTzToUnixSec Cast timestamptz value to int value as a Unix Timestamp in Seconds timestamptz int2, int4, int8, numeric, float4, float8 FloatToInt Cast float value to one of integer type. The fractional part will be discarded numeric, float4, float8 int2, int4, int8, numeric IntToFloat Cast int value to one of integer type int2, int4, int8, numeric numeric, float4, float8 IntToBool Cast int value to boolean. The value with 0 is false, 1 is true int2, int4, int8, numeric, float4, float8 bool BoolToInt Cast boolean value to int. The value false is 0, true is 1 bool int2, int4, int8, numeric, float4, float8"},{"location":"built_in_transformers/dynamic_parameters/#example-functional-dependency-resolution-between-columns","title":"Example: Functional dependency resolution between columns","text":"
There is simplified schema of the table humanresources.employee from the playground:
Column | Type \n------------------+-----------------------------\n businessentityid | integer \n jobtitle | character varying(50) \n birthdate | date \n hiredate | date \nCheck constraints:\n CHECK (birthdate >= '1930-01-01'::date AND birthdate <= (now() - '18 years'::interval))\n
As you can see, there is a functional dependency between the birthdate and hiredate columns. Logically, the hiredate should be later than the birthdate. Additionally, the birthdate should range from 1930-01-01 to 18 years prior to the current date.
Imagine that you need to generate random birthdate and hiredate columns. To ensure these dates satisfy the constraints, you can use dynamic parameters in the RandomDate transformer:
Firstly we generate the RadnomDate for birthdate column. The result of the transformation will used as the minimum value for the next transformation for hiredate column.
Apply the template for static parameter. It calculates the now date and subtracts 30 years from it. The result is 1994. The function tsModify return not a raw data, but time.Time object. For getting the raw value suitable for birthdate type we need to pass this value to .EncodeValue function. This value is used as the minimum value for the birthdate column.
The same as the previous step, but we subtract 18 years from the now date. The result is 2002.
Generate the RadnomDate for hiredate column based on the value from the birthdate.
Set the maximum value for the hiredate column. The value is the current date.
The min parameter is set to the value of the birthdate column from the previous step.
The template gets the value of the randomly generated birthdate value and adds 18 years to it.
Below is the result of the transformation:
From the result, you can see that all functional dependencies and constraints are satisfied.
It is allowed to generate parameter values from templates. It is useful when you don't want to write values manually, but instead want to generate and initialize them dynamically.
Here you can find the list of template functions that can be used in the template Custom functions.
You can encode and decode objects using the driver function bellow.
"},{"location":"built_in_transformers/parameters_templating/#template-functions","title":"Template functions","text":"Function Description Signature .GetColumnType Returns a string with the column type. .GetColumnType(name string) (typeName string, err error).EncodeValueByColumn Encodes a value of any type into its raw string representation using the specified column name. Encoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .EncodeValueByColumn(name string, value any) (res any, err error).DecodeValueByColumn Decodes a value from its raw string representation to a Golang type using the specified column name. Decoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .DecodeValueByColumn(name string, value any) (res any, err error).EncodeValueByType Encodes a value of any type into its string representation using the specified type name. Encoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .EncodeValueByType(name string, value any) (res any, err error).DecodeValueByType Decodes a value from its raw string representation to a Golang type using the specified type name. Decoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .DecodeValueByType(name string, value any) (res any, err error).DecodeValue Decodes a value from its raw string representation to a Golang type using the data type assigned to the table column specified in the column parameter. Decoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .DecodeValueByColumn(value any) (res any, err error).EncodeValue Encodes a value of any type into its string representation using the type assigned to the table column specified in the column parameter. Encoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .EncodeValue(value any) (res any, err error)
Warning
If column parameter is not linked to column parameter, then functions .DecodeValue and .EncodeValue will return an error. You can use .DecodeValueByType and .EncodeValueByType or .DecodeValueByColumn and .EncodeValueByColumn instead.
In the example below, the min and max values for the birth_date column are generated dynamically using the now template function. The value returns the current date and time. The tsModify function is then used to subtract 30 (and 18) years. But because the parameter type is mapped on column parameter type, the EncodeValue function is used to encode the value into the column type.
For example, if we have the now date as 2021-01-01, the dynamically calculated min value will be 1994-01-01 and the max value will be 2006-01-01.
CREATE TABLE account\n(\n id SERIAL PRIMARY KEY,\n gender VARCHAR(1) NOT NULL,\n email TEXT NOT NULL NOT NULL UNIQUE,\n first_name TEXT NOT NULL,\n last_name TEXT NOT NULL,\n birth_date DATE,\n created_at TIMESTAMP NOT NULL DEFAULT NOW()\n);\n\nINSERT INTO account (first_name, gender, last_name, birth_date, email)\nVALUES ('John', 'M', 'Smith', '1980-01-01', 'john.smith@gmail.com');\n
The transformation condition feature allows you to execute a defined transformation only if a specified condition is met. The condition must be defined as a boolean expression that evaluates to true or false. Greenmask uses expr-lang/expr under the hood. You can use all functions and syntax provided by the expr library.
You can use the same functions that are described in the built-in transformers
The transformers are executed one by one - this helps you create complex transformation pipelines. For instance depending on value chosen in the previous transformer, you can decide to execute the next transformer or not.
To improve the user experience, Greenmask offers special namespaces for accessing values in different formats: either the driver-encoded value in its real type or as a raw string.
record: This namespace provides the record value in its actual type.
raw_record: This namespace provides the record value as a string.
You can access a specific column\u2019s value using record.column_name for the real type or raw_record.column_name for the raw string value.
Warning
A record may always be modified by previous transformers before the condition is evaluated. This means Greenmask does not retain the original record value and instead provides the current modified value for condition evaluation.
Expression scope can be on table or specific transformer. If you define the condition on the table scope, then the condition will be evaluated before any transformer is executed. If you define the condition on the transformer scope, then the condition will be evaluated before the specified transformer is executed.
"},{"location":"built_in_transformers/transformation_condition/#int-and-float-value-definition","title":"Int and float value definition","text":"
It is important to create the integer or float value in the correct format. If you want to define the integer value you must write a number without dot (1, 2, etc.). If you want to define the float value you must write a number with dot (1.0, 2.0, etc.).
Warning
You may see a wrong comparison result if you compare int and float, for example 1 == 1.0 will return false.
Greenmask encodes the way only when evaluating the condition - this allows to optimize the performance of the transformation if you have a lot of conditions that uses or (||) or and (&&) operators.
"},{"location":"built_in_transformers/transformation_condition/#example-chose-random-value-and-execute-one-of","title":"Example: Chose random value and execute one of","text":"
In the following example, the RandomChoice transformer is used to choose a random value from the list of values. Depending on the chosen value, the Replace transformer is executed to set the activeflag column to true or false.
In this case the condition scope is on the transformer level.
"},{"location":"built_in_transformers/transformation_condition/#example-do-not-transform-specific-columns","title":"Example: Do not transform specific columns","text":"
In the following example, the RandomString transformer is executed only if the businessentityid column value is not equal to 1492 or 1.
The greenmask provides two engines random and hash. Most of the transformers has engine parameters that by default is set to random. Use hash engine when you need to generate deterministic data - the same input will always produce the same output.
Info
Greenmask employs the SHA-3 algorithm to hash input values. While this function is cryptographically secure, it does exhibit lower performance. We plan to introduce additional hash functions in the future to offer a balance between security and performance. For example, SipHash, which provides a good trade-off between security and performance, is currently in development and is expected to be included in the stable v0.2 release of Greenmask.
Warning
The hash engine does not guarantee the uniqueness of generated values. Although transformers such as Hash, RandomEmail, and RandomUuid typically have a low probability of producing duplicate values The feature to ensure uniqueness is currently under development at Greenmask and is expected to be released in future updates. For the latest status, please visit the Greenmask roadmap.
The random engine serves as the default engine for the greenmask. It operates using a pseudo-random number generator, which is initialized with a random seed sourced from a cryptographically secure random number generator. Employ the random engine when you need to generate random data and do not require reproducibility of the same transformation results with the same input.
The following example demonstrates how to configure the RandomDate transformer to generate random.
Keep in mind that the random engine is always generates different values for the same input. For instance in we run the previous example multiple times we will get different results.
The hash engine is designed to generate deterministic data. It uses the SHA-3 algorithm to hash the input value. The hash engine is particularly useful when you need to generate the same output for the same input. For example, when you want to transform values that are used as primary or foreign keys in a database.
For secure reason it is suggested set global greenmask salt via GREENMASK_GLOBAL_SALT environment variable. The salt is added to the hash input to prevent the possibility of reverse engineering the original value from the hashed output. The value is hex encoded with variadic length. For example, GREENMASK_GLOBAL_SALT=a5eddc84e762e810. Generate a strong random salt and keep it secret.
The following example demonstrates how to configure the RandomInt transformer to generate deterministic data using the hash engine. The public.account.id and public.orders.account_id columns will have the same values.
If you have partitioned tables or want to apply a transformation to a primary key and propagate it to all tables referencing that column, you can do so with Greenmask.
"},{"location":"built_in_transformers/transformation_inheritance/#apply-for-inherited","title":"Apply for inherited","text":"
Using apply_for_inherited, you can apply transformations to all partitions of a partitioned table, including any subpartitions.
When a partition has a transformation defined manually via config, and apply_for_inherited is set on the parent table, Greenmask will merge both the inherited and manually defined configurations. The manually defined transformation will execute last, giving it higher priority.
If this situation occurs, you will see the following information in the log:
{\n \"level\": \"info\",\n \"ParentTableSchema\": \"public\",\n \"ParentTableName\": \"sales\",\n \"ChildTableSchema\": \"public\",\n \"ChildTableName\": \"sales_2022_feb\",\n \"ChildTableConfig\": [\n {\n \"name\": \"RandomDate\",\n \"params\": {\n \"column\": \"sale_date\",\n \"engine\": \"random\",\n \"max\": \"2005-01-01\",\n \"min\": \"2001-01-01\"\n }\n }\n ],\n \"time\": \"2024-11-03T22:14:01+02:00\",\n \"message\": \"config will be merged: found manually defined transformers on the partitioned table\"\n}\n
"},{"location":"built_in_transformers/transformation_inheritance/#apply-for-references","title":"Apply for references","text":"
Using apply_for_references, you can apply transformations to columns involved in a primary key or in tables with a foreign key that references that column. This simplifies the transformation process by requiring you to define the transformation only on the primary key column, which will then be applied to all tables referencing that column.
The transformer must be deterministic or support hash engine and the hash engin must be set in the configuration file.
List of transformers that supports apply_for_references:
End-to-end identifiers in databases are unique identifiers that are consistently used across multiple tables in a relational database schema, allowing for a seamless chain of references from one table to another. These identifiers typically serve as primary keys in one table and are propagated as foreign keys in other tables, creating a direct, traceable link from one end of a data relationship to the other.
Greenmask can detect end-to-end identifiers and apply transformations across the entire sequence of tables. These identifiers are detected when the following condition is met: the foreign key serves as both a primary key and a foreign key in the referenced table.
When on the referenced column a transformation is manually defined via config, and the apply_for_references is set on parent table, the transformation defined will be chosen and the inherited transformation will be ignored. You will receive a INFO message in the logs.
The transformation condition will not be applied to the referenced column.
Not all transformers support apply_for_references
Warning
We do not recommend using apply_for_references with transformation conditions, as these conditions are not inherited by transformers on the referenced columns. This may lead to inconsistencies in the data.
In this example, we have a partitioned table sales that is partitioned by year and then by month. Each partition contains a subset of data based on the year and month of the sale. The sales table has a primary key sale_id and is partitioned by sale_date. The sale_date column is transformed using the RandomDate transformer.
CREATE TABLE sales\n(\n sale_id SERIAL NOT NULL,\n sale_date DATE NOT NULL,\n amount NUMERIC(10, 2) NOT NULL\n) PARTITION BY RANGE (EXTRACT(YEAR FROM sale_date));\n\n-- Step 2: Create first-level partitions by year\nCREATE TABLE sales_2022 PARTITION OF sales\n FOR VALUES FROM (2022) TO (2023)\n PARTITION BY LIST (EXTRACT(MONTH FROM sale_date));\n\nCREATE TABLE sales_2023 PARTITION OF sales\n FOR VALUES FROM (2023) TO (2024)\n PARTITION BY LIST (EXTRACT(MONTH FROM sale_date));\n\n-- Step 3: Create second-level partitions by month for each year, adding PRIMARY KEY on each partition\n\n-- Monthly partitions for 2022\nCREATE TABLE sales_2022_jan PARTITION OF sales_2022 FOR VALUES IN (1)\n WITH (fillfactor = 70);\nCREATE TABLE sales_2022_feb PARTITION OF sales_2022 FOR VALUES IN (2);\nCREATE TABLE sales_2022_mar PARTITION OF sales_2022 FOR VALUES IN (3);\n-- Continue adding monthly partitions for 2022...\n\n-- Monthly partitions for 2023\nCREATE TABLE sales_2023_jan PARTITION OF sales_2023 FOR VALUES IN (1);\nCREATE TABLE sales_2023_feb PARTITION OF sales_2023 FOR VALUES IN (2);\nCREATE TABLE sales_2023_mar PARTITION OF sales_2023 FOR VALUES IN (3);\n-- Continue adding monthly partitions for 2023...\n\n-- Step 4: Insert sample data\nINSERT INTO sales (sale_date, amount)\nVALUES ('2022-01-15', 100.00);\nINSERT INTO sales (sale_date, amount)\nVALUES ('2022-02-20', 150.00);\nINSERT INTO sales (sale_date, amount)\nVALUES ('2023-03-10', 200.00);\n
To transform the sale_date column in the sales table and all its partitions, you can use the following configuration:
This is ordinary table references where the primary key of the users table is referenced in the orders table.
-- Enable the extension for UUID generation (if not enabled)\nCREATE EXTENSION IF NOT EXISTS \"uuid-ossp\";\n\nCREATE TABLE users\n(\n user_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),\n username VARCHAR(50) NOT NULL\n);\n\nCREATE TABLE orders\n(\n order_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),\n user_id UUID REFERENCES users (user_id),\n order_date DATE NOT NULL\n);\n\nINSERT INTO users (username)\nVALUES ('john_doe');\nINSERT INTO users (username)\nVALUES ('jane_smith');\n\nINSERT INTO orders (user_id, order_date)\nVALUES ((SELECT user_id FROM users WHERE username = 'john_doe'), '2024-10-31'),\n ((SELECT user_id FROM users WHERE username = 'jane_smith'), '2024-10-30');\n
To transform the username column in the users table, you can use the following configuration:
This will apply the RandomUuid transformation to the user_id column in the orders table automatically.
"},{"location":"built_in_transformers/transformation_inheritance/#example-3-references-on-tables-with-end-to-end-identifiers","title":"Example 3. References on tables with end-to-end identifiers","text":"
In this example, we have three tables: tablea, tableb, and tablec. All tables have a composite primary key. In the tables tableb and tablec, the primary key is also a foreign key that references the primary key of tablea. This means that all PKs are end-to-end identifiers.
Change a JSON document using delete and set operations. NULL values are kept.
"},{"location":"built_in_transformers/advanced_transformers/json/#parameters","title":"Parameters","text":"Name Properties Description Default Required Supported DB types column The name of the column to be affected Yes json, jsonb operations A list of operations that contains editing delete and set Yes - \u221f operation Specifies the operation type: set or delete Yes - \u221f path The path to an object to be modified. See path syntax below. Yes - \u221f value A value to be assigned to the provided path No - \u221f value_template A Golang template to be assigned to the provided path. See the list of template functions below. No - \u221f error_not_exist Throws an error if the key does not exist by the provided path. Disabled by default. false No -"},{"location":"built_in_transformers/advanced_transformers/json/#description","title":"Description","text":"
The Json transformer applies a sequence of changing operations (set and/or delete) to a JSON document. The value can be static or dynamic. For the set operation type, a static value is provided in the value parameter, while a dynamic value is provided in the value_template parameter, taking the data received after template execution as a result. Both the value and value_template parameters are mandatory for the set operation.
The Json transformer is based on tidwall/sjson and supports the same path syntax. See their documentation for syntax rules.
"},{"location":"built_in_transformers/advanced_transformers/json/#template-functions","title":"Template functions","text":"Function Description Signature .GetPath Returns the current path to which the operation is being applied .GetPath() (path string).GetOriginalValue Returns the original value to which the current operation path is pointing. If the value at the specified path does not exist, it returns nil. .GetOriginalValue() (value any).OriginalValueExists Returns a boolean value indicating whether the specified path exists or not. .OriginalValueExists() (exists bool).GetColumnValue Returns an encoded into Golang type value for a specified column or throws an error. A value can be any of int, float, time, string, bool, or slice or map. .GetColumnValue(name string) (value any, err error).GetRawColumnValue Returns a raw value for a specified column as a string or throws an error .GetRawColumnValue(name string) (value string, err error).EncodeValueByColumn Encodes a value of any type into its raw string representation using the specified column name. Encoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .EncodeValueByColumn(name string, value any) (res any, err error).DecodeValueByColumn Decodes a value from its raw string representation to a Golang type using the specified column name. Decoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .DecodeValueByColumn(name string, value any) (res any, err error).EncodeValueByType Encodes a value of any type into its string representation using the specified type name. Encoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .EncodeValueByType(name string, value any) (res any, err error).DecodeValueByType Decodes a value from its raw string representation to a Golang type using the specified type name. Decoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .DecodeValueByType(name string, value any) (res any, err error)"},{"location":"built_in_transformers/advanced_transformers/json/#example-changing-json-document","title":"Example: Changing JSON document","text":"Json transformer example
Execute a Go template and automatically apply the result to a specified column.
"},{"location":"built_in_transformers/advanced_transformers/template/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes any template A Go template string Yes - validate Validates the template result using the PostgreSQL driver decoding procedure. Throws an error if a custom type does not have an encode-decoder implementation. false No -"},{"location":"built_in_transformers/advanced_transformers/template/#description","title":"Description","text":"
The Template transformer executes Go templates and automatically applies the template result to a specified column. Go template system is designed to be extensible, enabling developers to access data objects and incorporate custom functions programmatically. For more information, you can refer to the official Go Template documentation.
With the Template transformer, you can implement complicated transformation logic using basic or custom template functions. Below you can get familiar with the basic template functions for the Template transformer. For more information about available custom template functions, see Custom functions.
Warning
Pay attention to the whitespaces in templates. Use dash-wrapped - brackets {{- -}} for trimming the spaces. For example, the value \"2023-12-19\" is not the same as \" 2023-12-19 \" and it may throw an error when restoring.
"},{"location":"built_in_transformers/advanced_transformers/template/#template-functions","title":"Template functions","text":"Function Description Signature .GetColumnType Returns a string with the column type. .GetColumnType(name string) (typeName string, err error).GetValue Returns the column value for column assigned in the column parameter, encoded by the PostgreSQL driver into any type along with any associated error. Supported types include int, float, time, string, bool, as well as slice or map of any type. .GetValue() (value any, err error).GetRawValue Returns a raw value as a string for column assigned in the column parameter. .GetRawColumnValue(name string) (value string, err error).GetColumnValue Returns an encoded value for a specified column or throws an error. A value can be any of int, float, time, string, bool, or slice or map. .GetColumnValue(name string) (value any, err error).GetRawColumnValue Returns a raw value for a specified column as a string or throws an error .GetRawColumnValue(name string) (value string, err error).EncodeValue Encodes a value of any type into its string representation using the type assigned to the table column specified in the column parameter. Encoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .EncodeValue(value any) (res any, err error).DecodeValue Decodes a value from its raw string representation to a Golang type using the data type assigned to the table column specified in the column parameter. Decoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .DecodeValueByColumn(value any) (res any, err error).EncodeValueByColumn Encodes a value of any type into its raw string representation using the specified column name. Encoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .EncodeValueByColumn(name string, value any) (res any, err error).DecodeValueByColumn Decodes a value from its raw string representation to a Golang type using the specified column name. Decoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .DecodeValueByColumn(name string, value any) (res any, err error).EncodeValueByType Encodes a value of any type into its string representation using the specified type name. Encoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .EncodeValueByType(name string, value any) (res any, err error).DecodeValueByType Decodes a value from its raw string representation to a Golang type using the specified type name. Decoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .DecodeValueByType(name string, value any) (res any, err error)"},{"location":"built_in_transformers/advanced_transformers/template/#example-update-the-firstname-column","title":"Example: Update the firstname column","text":"
Value = TerryValue != Terri column name original value transformed firstname Terri Mary column name original value transformed firstname Ken Jr Mike"},{"location":"built_in_transformers/advanced_transformers/template_record/","title":"TemplateRecord","text":"
Modify records using a Go template and apply changes by using the PostgreSQL driver functions. This transformer provides a way to implement custom transformation logic.
"},{"location":"built_in_transformers/advanced_transformers/template_record/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types columns A list of columns to be affected by the template. The list of columns will be checked for constraint violations. No any template A Go template string Yes - validate Validate the template result via PostgreSQL driver decoding procedure. Throws an error if a custom type does not have an encode-decoder implementation. false No -"},{"location":"built_in_transformers/advanced_transformers/template_record/#description","title":"Description","text":"
TemplateRecord uses Go templates to change data. However, while the Template transformer operates with a single column and automatically applies results, the TemplateRecord transformer can make changes to a set of columns in the string, and using driver functions .SetValue or .SetRawValue is mandatory to do that.
With the TemplateRecord transformer, you can implement complicated transformation logic using basic or custom template functions. Below you can get familiar with the basic template functions for the TemplateRecord transformer. For more information about available custom template functions, see Custom functions.
"},{"location":"built_in_transformers/advanced_transformers/template_record/#template-functions","title":"Template functions","text":"Function Description Signature .GetColumnType Returns a string with the column type. .GetColumnType(name string) (typeName string, err error).GetColumnValue Returns an encoded value for a specified column or throws an error. A value can be any of int, float, time, string, bool, or slice or map. .GetColumnValue(name string) (value any, err error).GetRawColumnValue Returns a raw value for a specified column as a string or throws an error .GetRawColumnValue(name string) (value string, err error).SetColumnValue Sets a new value of a specific data type to the column. The value assigned must be compatible with the PostgreSQL data type of the column. For example, it is allowed to assign an int value to an INTEGER column, but you cannot assign a float value to a timestamptz column. SetColumnValue(name string, v any) (bool, error).SetRawColumnValue Sets a new raw value for a column, inheriting the column's existing data type, without performing data type validation. This can lead to errors when restoring the dump if the assigned value is not compatible with the column type. To ensure compatibility, consider using the .DecodeValueByColumn function followed by .SetColumnValue, for example, {{ \"13\" \\| .DecodeValueByColumn \"items_amount\" \\| .SetColumnValue \"items_amount\" }}. .SetRawColumnValue(name string, value any) (err error).EncodeValueByColumn Encodes a value of any type into its raw string representation using the specified column name. Encoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .EncodeValueByColumn(name string, value any) (res any, err error).DecodeValueByColumn Decodes a value from its raw string representation to a Golang type using the specified column name. Decoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .DecodeValueByColumn(name string, value any) (res any, err error).EncodeValueByType Encodes a value of any type into its string representation using the specified type name. Encoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .EncodeValueByType(name string, value any) (res any, err error).DecodeValueByType Decodes a value from its raw string representation to a Golang type using the specified type name. Decoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .DecodeValueByType(name string, value any) (res any, err error)"},{"location":"built_in_transformers/advanced_transformers/template_record/#example-generate-a-random-created_at-and-updated_at-dates","title":"Example: Generate a random created_at and updated_at dates","text":"
Below you can see the table structure:
The goal is to modify the \"created_at\" and \"updated_at\" columns based on the following rules:
Do not change the value if the created_at is Null.
If the created_at is not Null, generate the current time and use it as the minimum threshold for randomly generating the updated_at value.
Assign all generated values using the .SetColumnValue function.
column name original value transformed created_at 2021-01-20 07:01:00.513325+00 2023-12-17 19:37:29.910054Z updated_at 2021-08-09 21:27:00.513325+00 2023-12-18 10:05:25.828498Z"},{"location":"built_in_transformers/advanced_transformers/custom_functions/","title":"Template custom functions","text":"
Within Greenmask, custom functions play a crucial role, providing a wide array of options for implementing diverse logic. Under the hood, the custom functions are based on the sprig Go's template functions. Greenmask enhances this capability by introducing additional functions and transformation functions. These extensions mirror the logic found in the standard transformers but offer you the flexibility to implement intricate and comprehensive logic tailored to your specific needs.
Currently, you can use template custom functions for the advanced transformers:
Json
Template
TemplateRecord
and for the Transformation condition feature as well.
Custom functions are arbitrarily divided into 2 groups:
Core functions \u2014 custom functions that vary in purpose and include PostgreSQL driver, JSON output, testing, and transformation functions.
Faker functions \u2014 custom function of a faker type which generate synthetic data.
Below you can find custom core functions which are divided into categories based on the transformation purpose.
"},{"location":"built_in_transformers/advanced_transformers/custom_functions/core_functions/#postgresql-driver-functions","title":"PostgreSQL driver functions","text":"Function Description null Returns the NULL value that can be used for the driver encoding-decoding operations isNull Returns true if the checked value is NULLisNotNull Returns true if the checked value is not NULLsqlCoalesce Works as a standard SQL coalesce function. It allows you to choose the first non-NULL argument from the list."},{"location":"built_in_transformers/advanced_transformers/custom_functions/core_functions/#json-output-function","title":"JSON output function","text":"Function Description jsonExists Checks if the path value exists in JSON. Returns true if the path exists. mustJsonGet Gets the JSON attribute value by path and throws an error if the path does not exist mustJsonGetRaw Gets the JSON attribute raw value by path and throws an error if the path does not exist jsonGet Gets the JSON attribute value by path and returns nil if the path does not exist jsonGetRaw Gets the JSON attribute raw value by path and returns nil if the path does not exist jsonSet Sets the value for the JSON document by path jsonSetRaw Sets the raw value for the JSON document by path jsonDelete Deletes an attribute from the JSON document by path jsonValidate Validates the JSON document syntax and throws an error if there are any issues jsonIsValid Checks the JSON document for validity and returns true if it is valid toJsonRawValue Casts any type of value to the raw JSON value"},{"location":"built_in_transformers/advanced_transformers/custom_functions/core_functions/#testing-functions","title":"Testing functions","text":"Function Description isInt Checks if the value of an integer type isFloat Checks if the value of a float type isNil Checks if the value is nil isString Checks if the value of a string type isMap Checks if the value of a map type isSlice Checks if the value of a slice type isBool Checks if the value of a boolean type"},{"location":"built_in_transformers/advanced_transformers/custom_functions/core_functions/#transformation-and-generators","title":"Transformation and generators","text":""},{"location":"built_in_transformers/advanced_transformers/custom_functions/core_functions/#masking","title":"masking","text":"
Replaces characters with asterisk * symbols depending on the provided masking rule. If the value is NULL, it is kept unchanged. This function is based on ggwhite/go-masker.
Masking rulesSignatureParametersReturn values Rule Description Example input Example output default Returns the sequence of * symbols of the same length test1234********name Masks the second and the third letters ABCDA**Dpassword Always returns a sequence of *address Keeps first 6 letters, masks the rest Larnaca, makarios stLarnac*************email Keeps a domain and the first 3 letters, masks the rest ggw.chang@gmail.comggw****@gmail.commobile Masks 3 digits starting from the 4th digit 09876543210987***321telephone Removes (, ), , - symbols, masks last 4 digits of a telephone number, and formats it to (??)????-????0227993078(02)2799-****id Masks last 4 digits of an ID A123456789A12345****credit_card Masks 6 digits starting from the 7th digit 1234567890123456123456******3456url Masks the password part of the URL (if applicable) http://admin:mysecretpassword@localhost:1234/urihttp://admin:xxxxx@localhost:1234/uri
masking(dataType string, value string) (res string, err error)
dataType \u2014 one of the masking rules (see previous tab)
Adds or subtracts a random fraction to or from the original float value. Multiplies the original float value by a provided random value that is not higher than the ratio parameter and adds it to the original value with the option to specify the decimal via the decimal parameter.
SignatureParametersReturn values
noiseFloat(ratio float, decimal int, value float) (res float64, err error)
ratio \u2014 the maximum multiplier value in the interval (0:1). The value will be randomly generated up to ratio, multiplied by the original value, and the result will be added to the original value.
Adds or subtracts a random fraction to or from the original integer value. Multiplies the original integer value by a provided random value that is not higher than the ratio parameter and adds it to the original value.
SignatureParametersReturn values
noiseInt(ratio float, value float) (res int, err error)
ratio \u2014 the max multiplier value in the interval (0:1). The value will be generated randomly up to ratio, multiplied by the original value, and the result will be added to the original value.
Greenmask uses go-faker/faker under the hood for generating of synthetic data.
"},{"location":"built_in_transformers/advanced_transformers/custom_functions/faker_function/#faker-functions-address","title":"Faker functions: Address","text":"Function Description Signature fakerRealAddress Generates a random real-world address that includes: city, state, postal code, latitude, and longitude fakerRealAddress() (res ReadAddress)fakerLatitude Generates random fake latitude fakerLatitude() (res float64)fakerLongitude Generates random fake longitude fakerLongitude() (res float64)"},{"location":"built_in_transformers/advanced_transformers/custom_functions/faker_function/#faker-functions-datetime","title":"Faker functions: Datetime","text":"Function Description Signature fakerUnixTime Generates random Unix time in seconds fakerLongitude() (res int64)fakerDate Generates random date with the pattern of YYYY-MM-DDfakerDate() (res string)fakerTimeString Generates random time fakerTimeString() (res string)fakerMonthName Generates a random month fakerMonthName() (res string)fakerYearString Generates a random year fakerYearString() (res string)fakerDayOfWeek Generates a random day of a week fakerDayOfWeek() (res string)fakerDayOfMonth Generates a random day of a month fakerDayOfMonth() (res string)fakerTimestamp Generates a random timestamp with the pattern of YYYY-MM-DD HH:MM:SSfakerTimestamp() (res string)fakerCentury Generates a random century fakerCentury() (res string)fakerTimezone Generates a random timezone name fakerTimezone() (res string)fakerTimeperiod Generates a random time period with the patter of either AM or PMfakerTimeperiod() (res string)"},{"location":"built_in_transformers/advanced_transformers/custom_functions/faker_function/#faker-functions-internet","title":"Faker functions: Internet","text":"Function Description Signature fakerEmail Generates a random email fakerEmail() (res string)fakerMacAddress Generates a random MAC address fakerMacAddress() (res string)fakerDomainName Generates a random domain name fakerDomainName() (res string)fakerURL Generates a random URL with the pattern of https://www.domainname.some/somepathfakerURL() (res string)fakerUsername Generates a random username fakerUsername() (res string)fakerIPv4 Generates a random IPv4 address fakerIPv4() (res string)fakerIPv6 Generates a random IPv6 address fakerIPv6() (res string)fakerPassword Generates a random password fakerPassword() (res string)"},{"location":"built_in_transformers/advanced_transformers/custom_functions/faker_function/#faker-functions-words-and-sentences","title":"Faker functions: words and sentences","text":"Function Description Signature fakerWord Generates a random word fakerWord() (res string)fakerSentence Generates a random sentence fakerSentence() (res string)fakerParagraph Generates a random sequence of sentences as a paragraph fakerParagraph() (res string)"},{"location":"built_in_transformers/advanced_transformers/custom_functions/faker_function/#faker-functions-payment","title":"Faker functions: Payment","text":"Function Description Signature fakerCCType Generates a random credit card type, e.g. VISA, MasterCard, etc. fakerCCType() (res string)fakerCCNumber Generates a random credit card number fakerCCNumber() (res string)fakerCurrency Generates a random currency name fakerCurrency() (res string)fakerAmountWithCurrency Generates random amount preceded with random currency fakerAmountWithCurrency() (res string)"},{"location":"built_in_transformers/advanced_transformers/custom_functions/faker_function/#faker-functions-person","title":"Faker functions: Person","text":"Function Description Signature fakerTitleMale Generates a random male title from the predefined list fakerTitleMale() (res string)fakerTitleFemale Generates a random female title from the predefined list fakerTitleFemale() (res string)fakerFirstName Generates a random first name fakerFirstName() (res string)fakerFirstNameMale Generates a random male first name fakerFirstNameMale() (res string)fakerFirstNameFemale Generates a random female first name fakerFirstNameFemale() (res string)fakerFirstLastName Generates a random last name fakerFirstLastName() (res string)fakerName Generates a random full name preceded with a title fakerName() (res string)"},{"location":"built_in_transformers/advanced_transformers/custom_functions/faker_function/#faker-functions-phone","title":"Faker functions: Phone","text":"Function Description Signature fakerPhoneNumber Generates a random phone number fakerPhoneNumber() (res string)fakerTollFreePhoneNumber Generates a random phone number with the pattern of (123) 456-7890fakerTollFreePhoneNumber() (res string)fakerE164PhoneNumber Generates a random phone number with the pattern of +12345678900fakerE164PhoneNumber() (res string)"},{"location":"built_in_transformers/advanced_transformers/custom_functions/faker_function/#faker-functions-uuid","title":"Faker functions: UUID","text":"Function Description Signature fakerUUIDHyphenated Generates a random unique user ID separated by hyphens fakerUUID() (res string)fakerUUIDDigit Generates a random unique user ID in the HEX format fakerUUIDDigit() (res string)"},{"location":"built_in_transformers/standard_transformers/","title":"Standard transformers","text":"
Standard transformers are ready-to-use methods that require no customization and perform with just as little as parameters input. Below you can find an index of all standard transformers currently available in Greenmask.
Cmd \u2014 transforms data via external program using stdin and stdout interaction.
Dict \u2014 replaces values matched by dictionary keys.
Hash \u2014 generates a hash of the text value.
Masking \u2014 masks a value using one of the masking behaviors depending on your domain.
NoiseDate \u2014 randomly adds or subtracts a duration within the provided ratio interval to the original date value.
NoiseFloat \u2014 adds or subtracts a random fraction to the original float value.terval to the original date value.
NoiseNumeric \u2014 adds or subtracts a random fraction to the original numeric value.
NoiseInt \u2014 adds or subtracts a random fraction to the original integer value.
RandomBool \u2014 generates random boolean values.
RandomChoice \u2014 replaces values randomly chosen from a provided list.
RandomDate \u2014 generates a random date in a specified interval.
RandomFloat \u2014 generates a random float within the provided interval.
RandomInt \u2014 generates a random integer within the provided interval.
RandomString \u2014 generates a random string using the provided characters within the specified length range.
RandomUuid \u2014 generates a random unique user ID.
RandomLatitude \u2014 generates a random latitude value.
RandomLongitude \u2014 generates a random longitude value.
RandomUnixTimestamp \u2014 generates a random Unix timestamp.
RandomDayOfWeek \u2014 generates a random day of the week.
RandomDayOfMonth \u2014 generates a random day of the month.
RandomMonthName \u2014 generates the name of a random month.
RandomYearString \u2014 generates a random year as a string.
RandomCentury \u2014 generates a random century.
RandomTimezone \u2014 generates a random timezone.
RandomEmail \u2014 generates a random email address.
RandomUsername \u2014 generates a random username.
RandomPassword \u2014 generates a random password.
RandomDomainName \u2014 generates a random domain name.
RandomURL \u2014 generates a random URL.
RandomMac \u2014 generates a random MAC addresses.
RandomIP \u2014 generates a random IPv4 or IPv6 addresses.
RandomWord \u2014 generates a random word.
RandomSentence \u2014 generates a random sentence.
RandomParagraph \u2014 generates a random paragraph.
RandomCCType \u2014 generates a random credit card type.
RandomCCNumber \u2014 generates a random credit card number.
RandomCurrency \u2014 generates a random currency code.
RandomAmountWithCurrency \u2014 generates a random monetary amount with currency.
RandomPerson \u2014 generates a random person data (first name, last name, etc.)
RandomPhoneNumber \u2014 generates a random phone number.
RandomTollFreePhoneNumber \u2014 generates a random toll-free phone number.
RandomE164PhoneNumber \u2014 generates a random phone number in E.164 format.
RealAddress \u2014 generates a real address.
RegexpReplace \u2014 replaces a string using a regular expression.
Replace \u2014 replaces an original value by the provided one.
Transform data via external program using stdin and stdout interaction.
"},{"location":"built_in_transformers/standard_transformers/cmd/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types columns A list of column names to be affected. If empty, the entire tuple is used. Read about the structure further. Yes Any executable The path to the executable parameter file Yes - args A list of parameters for the executable No - driver The row driver with parameters that is used for interacting with cmd. See details below. {\"name\": \"csv\"} No - validate Performs a decoding operation using the PostgreSQL driver for data received from the command to ensure the data format is correct false No - timeout Timeout for sending and receiving data from the external command 2s No - expected_exit_code The expected exit code on SIGTERM signal. If the exit code is unexpected, the transformation exits with an error. 0 No - skip_on_behaviour Skips transformation call if one of the provided columns has a null value (any) or each of the provided columns has null values (all). This option works together with the skip_on_null_input parameter on columns. Possible values: all, any. all No -
Warning
The parameter validate_output=true may cause an error if the type does not have a PostgreSQL driver decoder implementation. Most of the types, such as int, float, text, varchar, date, timestamp, etc., have encoders and decoders, as well as inherited types like domain types based on them.
The Cmd transformer allows you to send original data to an external program via stdin and receive transformed data from stdout. It supports various interaction formats such as json, csv, or plain text for one-column transformations. The interaction is performed line by line, so at the end of each sent data, a new line symbol \\n must be included.
"},{"location":"built_in_transformers/standard_transformers/cmd/#types-of-interaction-modes","title":"Types of interaction modes","text":""},{"location":"built_in_transformers/standard_transformers/cmd/#text","title":"text","text":"
Textual driver that is used only for one column transformation, thus you cannot provide here more than one column. The value encodes into string laterally. For example, 2023-01-03 01:00:00.0+03.
JSON line driver. It has two formats that can be passed through driver.json_data_format: [text|bytes]. Use the bytes format for binary datatypes. Use the text format for non-binary datatypes and for those that can be represented as string literals. The default json_data_format is text.
Each line is a JSON line with a map of attribute numbers to their values
d \u2014 the raw data represented as base64 encoding for the bytes format or Unicode text for the text format. The base64 encoding is needed because data can be binary.
CSV driver (comma-separated). The number of attributes is the same as the number of table columns, but the columns that were not mentioned in the columns list are empty. The NULL value is represented as \\N. Each attribute is escaped by a quote (\"). For example, if the transformed table has attributes id, title, and created_at, and only id and created_at require transformation, then the CSV line will look as follows:
name \u2014 the name of the column. This value is required. Depending on the attributes that follows further, this column may be used just as a value and is not affected in any way.
not_affected \u2014 indicates whether the column is affected in the transformation. This attribute is required for the validation procedure when Greenmask is called with greenmask dump --validate. Setting not_affected=true can be helpful when the command transformer transforms data depending on the value of another column. For example, if you want to generate an updated_at column value depending on the created_at column value, you can set created_at to not_affected=true. The default value is false.
skip_original_data \u2014 indicates whether the original data is required for the transformer. This attribute can be helpful for decreasing the interaction time. One use case is when the command works as a generator and returns the value without relying on the original data. The default value is false.
skip_on_null_input \u2014 specifies whether to skip transformation when the original value is null. This attribute works in conjunction with the skip_on_behaviour parameter. For example, if you have two affected columns with skip_on_null_input=true and one column is null, then, if skip_on_behaviour=any, the transformation will be skipped, or, if skip_on_behaviour=and, the transformation will be performed. The default is false.
"},{"location":"built_in_transformers/standard_transformers/cmd/#example-apply-transformation-performed-by-external-command-in-text-format","title":"Example: Apply transformation performed by external command in TEXT format","text":"
In the following example, jobtitle columns is transformed via external command transformer.
External transformer in python example
#!/usr/bin/env python3\nimport signal\nimport sys\n\nsignal.signal(signal.SIGTERM, lambda sig, frame: exit(0))\n\n\n# If we want to implement a simple generator, we need read the line from stdin and write any result to stdout\nfor _ in sys.stdin:\n # Writing the result to stdout with new line and flushing the buffer\n sys.stdout.write(\"New Job Title\")\n sys.stdout.write(\"\\n\")\n sys.stdout.flush()\n
"},{"location":"built_in_transformers/standard_transformers/cmd/#example-apply-transformation-performed-by-external-command-in-json-format","title":"Example: Apply transformation performed by external command in JSON format","text":"
In the following example, jobtitle and loginid columns are transformed via external command transformer.
External transformer in python example
#!/usr/bin/env python3\nimport json\nimport signal\nimport sys\n\nsignal.signal(signal.SIGTERM, lambda sig, frame: exit(0))\n\nfor line in sys.stdin:\n res = json.loads(line)\n # Setting dummy values\n res[\"jobtitle\"] = {\"d\": \"New Job Title\", \"n\": False}\n res[\"loginid\"][\"d\"] = \"123\"\n\n # Writing the result to stdout with new line and flushing the buffer\n sys.stdout.write(json.dumps(res))\n sys.stdout.write(\"\\n\")\n sys.stdout.flush()\n
Validate the received data via decode procedure using the PostgreSQL driver. Note that this may cause an error if the type is not supported in the PostgreSQL driver.
Skip transformation (keep the values) if one of the affected columns (not_affected=false) has a null value.
If a column has a null value, then skip it. This works in conjunction with skip_on_behaviour. Since it has the value any, if one of the columns (jobtitle or loginid) has a null value, then skip the transformation call.
The format of JSON can be either text or bytes. The default value is text.
The skip_original_data attribute is set to true the date will not be transfered to the command. This column will contain the empty original data
"},{"location":"built_in_transformers/standard_transformers/dict/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes any values Value replace mapping as in: {\"string\": \"string\"}. The string with value \"\\N\" is considered NULL. No - default Shown if no value has been matched with dict. The string with value \"\\N\" is considered NULL. By default is empty. No - fail_not_matched When no value is matched with the dict, fails the replacement process if set to true, or keeps the current value, if set to false. true No - validate Performs the encode-decode procedure using column type to ensure that values have correct type true No -"},{"location":"built_in_transformers/standard_transformers/dict/#description","title":"Description","text":"
The Dict transformer uses a user-provided key-value dictionary to replace values based on matches specified in the values parameter mapping. These provided values must align with the PostgreSQL type format. To validate the values format before application, you can utilize the validate parameter, triggering a decoding procedure via the PostgreSQL driver.
If there are no matches by key, an error will be raised according to a default fail_not_matched: true parameter. You can change this behaviour by providing the default parameter, value from which will be shown in case of a missing match.
In certain cases where the driver type does not support the validation operation, an error may occur. For setting or matching a NULL value, use a string with the \\N sequence.
Generate a hash of the text value using the Scrypt hash function under the hood. NULL values are kept.
"},{"location":"built_in_transformers/standard_transformers/hash/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar salt Hex encoded salt string. This value may be provided via environment variable GREENMASK_GLOBAL_SALT Yes text, varchar function Hash algorithm to anonymize data. Can be any of md5, sha1, sha256, sha512, sha3-224, sha3-254, sha3-384, sha3-512. sha1 No - max_length Indicates whether to truncate the hash tail and specifies at what length. Can be any integer number, where 0 means \"no truncation\". 0 No -"},{"location":"built_in_transformers/standard_transformers/hash/#example-generate-hash-from-job-title","title":"Example: Generate hash from job title","text":"
The following example generates a hash from the jobtitle into sha1 and truncates the results after the 10th character.
We can set the salt via the environment variable GREENMASK_GLOBAL_SALT:
| column name | original value | transformed |\n|-------------|----------------------------------|-------------|\n| jobtitle | Research and Development Manager | 3a456da5c5 |\n
Mask a value using one of the masking rules depending on your domain. NULL values are kept.
"},{"location":"built_in_transformers/standard_transformers/masking/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar type Data type of attribute (default, password, name, addr, email, mobile, tel, id, credit, url) default No -"},{"location":"built_in_transformers/standard_transformers/masking/#description","title":"Description","text":"
The Masking transformer replaces characters with asterisk * symbols depending on the provided data type. If the value is NULL, it is kept unchanged. It is based on ggwhite/go-masker and supports the following masking rules:
Type Description default Returns * symbols with the same length, e.g. input: test1234 output: ******** name Masks the second letter the third letter in a word, e. g. input: ABCD output: A**D password Always returns ************ address Keeps first 6 letters, masks the rest, e. g. input: Larnaca, makarios st output: Larnac************* email Keeps a domain and the first 3 letters, masks the rest, e. g. input: ggw.chang@gmail.com output: ggw****@gmail.com mobile Masks 3 digits starting from the 4th digit, e. g. input: 0987654321 output: 0987***321 telephone Removes (, ), , - chart, and masks last 4 digits of telephone number, then formats it to (??)????-????, e. g. input: 0227993078 output: (02)2799-**** id Masks last 4 digits of ID number, e. g. input: A123456789 output: A12345**** credit_cart Masks 6 digits starting from the 7th digit, e. g. input 1234567890123456 output 123456******3456 url Masks the password part of the URL, if applicable, e. g. http://admin:mysecretpassword@localhost:1234/uri output: http://admin:xxxxx@localhost:1234/uri"},{"location":"built_in_transformers/standard_transformers/masking/#example-masking-employee-national-id-number","title":"Example: Masking employee national ID number","text":"
In the following example, the national ID number of an employee is masked.
Randomly add or subtract a duration within the provided ratio interval to the original date value.
"},{"location":"built_in_transformers/standard_transformers/noise_date/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes date, timestamp, timestamptz min_ratio The minimum random value for noise. The value must be in PostgreSQL interval format, e. g. 1 year 2 mons 3 day 04:05:06.07 5% from max_ration parameter No - max_ratio The maximum random value for noise. The value must be in PostgreSQL interval format, e. g. 1 year 2 mons 3 day 04:05:06.07 Yes - min Min threshold date (and/or time) of value. The value has the same format as column parameter No - max Max threshold date (and/or time) of value. The value has the same format as column parameter No - truncate Truncate the date to the specified part (nanosecond, microsecond, millisecond, second, minute, hour, day, month, year). The truncate operation is not applied by default. No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/noise_date/#dynamic-parameters","title":"Dynamic parameters","text":"Parameter Supported types min date, timestamp, timestamptz max date, timestamp, timestamptz"},{"location":"built_in_transformers/standard_transformers/noise_date/#description","title":"Description","text":"
The NoiseDate transformer randomly generates duration between min_ratio and max_ratio parameter and adds it to or subtracts it from the original date value. The min_ratio or max_ratio parameters must be written in the PostgreSQL interval format. You can also truncate the resulted date up to a specified part by setting the truncate parameter.
In case you have constraints on the date range, you can set the min and max parameters to specify the threshold values. The values for min and max must have the same format as the column parameter. Parameters min and max support dynamic mode.
Info
If the noised value exceeds the max threshold, the transformer will set the value to max. If the noised value is lower than the min threshold, the transformer will set the value to min.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/noise_date/#example-adding-noise-to-the-modified-date","title":"Example: Adding noise to the modified date","text":"
In the following example, the original timestamp value of modifieddate will be noised up to 1 year 2 months 3 days 4 hours 5 minutes 6 seconds and 7 milliseconds with truncation up to the month part.
NoiseDate transformer example
- schema: \"humanresources\"\n name: \"jobcandidate\"\n transformers:\n - name: \"NoiseDate\"\n params:\n column: \"hiredate\"\n max_ratio: \"1 year 2 mons 3 day 04:05:06.07\"\n truncate: \"month\"\n max: \"2020-01-01 00:00:00\"\n
"},{"location":"built_in_transformers/standard_transformers/noise_date/#example-adding-noise-to-the-modified-date-with-dynamic-min-parameter-with-hash-engine","title":"Example: Adding noise to the modified date with dynamic min parameter with hash engine","text":"
In the following example, the original timestamp value of hiredate will be noised up to 1 year 2 months 3 days 4 hours 5 minutes 6 seconds and 7 milliseconds with truncation up to the month part. The max threshold is set to 2020-01-01 00:00:00, and the min threshold is set to the birthdate column. If the birthdate column is NULL, the default value 1990-01-01 will be used. The hash engine is used for deterministic generation - the same input will always produce the same output.
Add or subtract a random fraction to the original float value.
"},{"location":"built_in_transformers/standard_transformers/noise_float/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes float4, float8 decimal The decimal of the noised float value (number of digits after the decimal point) 4 No - min_ratio The minimum random percentage for noise, from 0 to 1, e. g. 0.1 means \"add noise up to 10%\" 0.05 No - max_ratio The maximum random percentage for noise, from 0 to 1, e. g. 0.1 means \"add noise up to 10%\" Yes - min Min threshold of noised value No - max Max threshold of noised value No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/noise_float/#dynamic-parameters","title":"Dynamic parameters","text":"Parameter Supported types min float4, float8, int2, int4, int8 max float4, float8, int2, int4, int8"},{"location":"built_in_transformers/standard_transformers/noise_float/#description","title":"Description","text":"
The NoiseFloat transformer multiplies the original float value by randomly generated value that is not higher than the max_ratio parameter and not less that max_ratio parameter and adds it to or subtracts it from the original value. Additionally, you can specify the number of decimal digits by using the decimal parameter.
In case you have constraints on the float range, you can set the min and max parameters to specify the threshold values. The values for min and max must have the same format as the column parameter. Parameters min and max support dynamic mode. Engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines
Info
If the noised value exceeds the max threshold, the transformer will set the value to max. If the noised value is lower than the min threshold, the transformer will set the value to min.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/noise_float/#example-adding-noise-to-the-purchase-price","title":"Example: Adding noise to the purchase price","text":"
In this example, the original value of standardprice will be noised up to 50% and rounded up to 2 decimals.
Add or subtract a random fraction to the original integer value.
"},{"location":"built_in_transformers/standard_transformers/noise_int/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes int2, int4, int8 min_ratio The minimum random percentage for noise, from 0 to 1, e. g. 0.1 means \"add noise up to 10%\" 0.05 No - max_ratio The maximum random percentage for noise, from 0 to 1, e. g. 0.1 means \"add noise up to 10%\" Yes - min Min threshold of noised value No - max Min threshold of noised value No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/noise_int/#dynamic-parameters","title":"Dynamic parameters","text":"Parameter Supported types min int2, int4, int8 max int2, int4, int8"},{"location":"built_in_transformers/standard_transformers/noise_int/#description","title":"Description","text":"
The NoiseInt transformer multiplies the original integer value by randomly generated value that is not higher than the max_ratio parameter and not less that max_ratio parameter and adds it to or subtracts it from the original value.
In case you have constraints on the integer range, you can set the min and max parameters to specify the threshold values. The values for min and max must have the same format as the column parameter. Parameters min and max support dynamic mode.
Info
If the noised value exceeds the max threshold, the transformer will set the value to max. If the noised value is lower than the min threshold, the transformer will set the value to min.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/noise_int/#example-noise-vacation-hours-of-an-employee","title":"Example: Noise vacation hours of an employee","text":"
In the following example, the original value of vacationhours will be noised up to 40%. The transformer will set the value to 10 if the noised value is lower than 10 and to 1000 if the noised value exceeds 1000.
Add or subtract a random fraction to the original numeric value.
"},{"location":"built_in_transformers/standard_transformers/noise_numeric/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes numeric, decimal decimal The decimal of the noised float value (number of digits after the decimal point) 4 No - min_ratio The minimum random percentage for noise, from 0 to 1, e. g. 0.1 means \"add noise up to 10%\" 0.05 No - max_ratio The maximum random percentage for noise, from 0 to 1, e. g. 0.1 means \"add noise up to 10%\" Yes - min Min threshold of noised value No - max Max threshold of noised value No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/noise_numeric/#dynamic-parameters","title":"Dynamic parameters","text":"Parameter Supported types min numeric, decimal, float4, float8, int2, int4, int8 max numeric, decimal, float4, float8, int2, int4, int8"},{"location":"built_in_transformers/standard_transformers/noise_numeric/#description","title":"Description","text":"
The NoiseNumeric transformer multiplies the original numeric (or decimal) value by randomly generated value that is not higher than the max_ratio parameter and not less that max_ratio parameter and adds it to or subtracts it from the original value. Additionally, you can specify the number of decimal digits by using the decimal parameter.
In case you have constraints on the numeric range, you can set the min and max parameters to specify the threshold values. The values for min and max must have the same format as the column parameter. Parameters min and max support dynamic mode. Engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines
Info
If the noised value exceeds the max threshold, the transformer will set the value to max. If the noised value is lower than the min threshold, the transformer will set the value to min.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
Warning
Greenmask cannot parse the numeric type sitteng. For instance NUMERIC(10, 2). You should set min and max treshholds manually as well as allowed decimal. This behaviour will be changed in the later versions. Grenmask will be able to determine the decimal and scale of the column and set the min and max treshholds automatically if were not set.
"},{"location":"built_in_transformers/standard_transformers/noise_numeric/#example-adding-noise-to-the-purchase-price","title":"Example: Adding noise to the purchase price","text":"
In this example, the original value of standardprice will be noised up to 50% and rounded up to 2 decimals.
The RandomAmountWithCurrency transformer is specifically designed to populate specified database columns with random financial amounts accompanied by currency codes. Ideal for applications requiring the simulation of financial transactions, this utility enhances the realism of financial datasets by introducing variability in amounts and currencies.
"},{"location":"built_in_transformers/standard_transformers/random_amount_with_currency/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_amount_with_currency/#description","title":"Description","text":"
This transformer automatically generates random financial amounts along with corresponding global currency codes (e. g., 250.00 USD, 300.00 EUR), injecting them into the designated database column. It provides a straightforward solution for populating financial records with varied and realistic data, suitable for testing payment systems, data anonymization, and simulation of economic models.
"},{"location":"built_in_transformers/standard_transformers/random_amount_with_currency/#example-populate-the-payments-table-with-random-amounts-and-currencies","title":"Example: Populate the payments table with random amounts and currencies","text":"
This example shows how to configure the RandomAmountWithCurrency transformer to populate the payment_details column in the payments table with random amounts and currencies. It is an effective approach to simulating a diverse range of payment transactions.
In this setup, the payment_details column will be updated with random financial amounts and currency codes for each entry, replacing any existing non-NULL values. The keep_null parameter, when set to true, ensures that existing NULL values in the column remain unchanged, preserving the integrity of records without specified payment details.
"},{"location":"built_in_transformers/standard_transformers/random_bool/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes bool keep_null Indicates whether NULL values should be replaced with transformed values or not true No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_bool/#description","title":"Description","text":"
The RandomBool transformer generates a random boolean value. The behaviour for NULL values can be configured using the keep_null parameter. The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/random_bool/#example-generate-a-random-boolean-for-a-column","title":"Example: Generate a random boolean for a column","text":"
In the following example, the RandomBool transformer generates a random boolean value for the salariedflag column.
The RandomCCNumber transformer is specifically designed to populate specified database columns with random credit card numbers. This utility is crucial for applications that involve simulating financial data, testing payment systems, or anonymizing real credit card numbers in datasets.
"},{"location":"built_in_transformers/standard_transformers/random_cc_number/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_cc_number/#description","title":"Description","text":"
By leveraging algorithms capable of generating plausible credit card numbers that adhere to standard credit card validation rules (such as the Luhn algorithm), the RandomCCNumber transformer injects random credit card numbers into the designated database column. This approach ensures the generation of credit card numbers that are realistic for testing and development purposes, without compromising real-world applicability and security.
"},{"location":"built_in_transformers/standard_transformers/random_cc_number/#example-populate-random-credit-card-numbers-for-the-payment_information-table","title":"Example: Populate random credit card numbers for the payment_information table","text":"
This example demonstrates configuring the RandomCCNumber transformer to populate the cc_number column in the payment_information table with random credit card numbers. It is an effective strategy for creating a realistic set of payment data for application testing or data anonymization.
With this setup, the cc_number column will be updated with random credit card numbers for each entry, replacing any existing non-NULL values. If the keep_null parameter is set to true, it will ensure that existing NULL values in the column are preserved, maintaining the integrity of records where credit card information is not applicable or available.
The RandomCCType transformer is designed to populate specified database columns with random credit card types. This tool is essential for applications that require the simulation of financial transaction data, testing payment processing systems, or anonymizing credit card type information in datasets.
"},{"location":"built_in_transformers/standard_transformers/random_cc_type/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_cc_type/#description","title":"Description","text":"
Utilizing a predefined list of credit card types (e.g., VISA, MasterCard, American Express, Discover), the RandomCCType transformer injects random credit card type names into the designated database column. This feature allows for the creation of realistic and varied financial transaction datasets by simulating a range of credit card types without using real card data.
"},{"location":"built_in_transformers/standard_transformers/random_cc_type/#example-populate-random-credit-card-types-for-the-transactions-table","title":"Example: Populate random credit card types for the transactions table","text":"
This example shows how to configure the RandomCCType transformer to populate the card_type column in the transactions table with random credit card types. It is a straightforward method for simulating diverse payment methods across transactions.
In this configuration, the card_type column will be updated with random credit card types for each entry, replacing any existing non-NULL values. If the keep_null parameter is set to true, existing NULL values in the column will be preserved, maintaining the integrity of records where card type information is not applicable.
The RandomCentury transformer is crafted to populate specified database columns with random century values. It is ideal for applications that require historical data simulation, such as generating random years within specific centuries for historical databases, testing datasets with temporal dimensions, or anonymizing dates in historical research data.
"},{"location":"built_in_transformers/standard_transformers/random_century/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_century/#description","title":"Description","text":"
The RandomCentury transformer utilizes an algorithm or a library function (hypothetical in this context) to generate random century values. Each value represents a century (e.g., 19th, 20th, 21st), providing a broad temporal range that can be used to enhance datasets requiring a distribution across different historical periods without the need for precise date information.
"},{"location":"built_in_transformers/standard_transformers/random_century/#example-populate-random-centuries-for-the-historical_artifacts-table","title":"Example: Populate random centuries for the historical_artifacts table","text":"
This example shows how to configure the RandomCentury transformer to populate the century column in a historical_artifacts table with random century values, adding an element of variability and historical context to the dataset.
In this setup, the century column will be filled with random century values, replacing any existing non-NULL values. If the keep_null parameter is set to true, then existing NULL values in the column will remain untouched, preserving the original dataset's integrity where no temporal data is available.
Replace values randomly chosen from a provided list.
"},{"location":"built_in_transformers/standard_transformers/random_choice/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes any values A list of values in any format. The string with value \\N is considered NULL. Yes - validate Performs a decoding procedure via the PostgreSQL driver using the column type to ensure that values have correct type true No keep_null Indicates whether NULL values should be replaced with transformed values or not true No engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_choice/#description","title":"Description","text":"
The RandomChoice transformer replaces one randomly chosen value from the list provided in the values parameter. You can use the validate parameter to ensure that values are correct before applying the transformation. The behaviour for NULL values can be configured using the keep_null parameter.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/random_choice/#example-choosing-randomly-from-provided-dates","title":"Example: Choosing randomly from provided dates","text":"
In this example, the provided values undergo validation through PostgreSQL driver decoding, and one value is randomly chosen from the list.
The RandomCurrency transformer is tailored to populate specified database columns with random currency codes. This tool is highly beneficial for applications involving the simulation of international financial data, testing currency conversion features, or anonymizing currency information in datasets.
"},{"location":"built_in_transformers/standard_transformers/random_currency/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_currency/#description","title":"Description","text":"
Utilizing a comprehensive list of global currency codes (e.g., USD, EUR, JPY), the RandomCurrency transformer injects random currency codes into the designated database column. This feature allows for the creation of diverse and realistic financial transaction datasets by simulating a variety of currencies without relying on actual financial data.
"},{"location":"built_in_transformers/standard_transformers/random_currency/#example-populate-random-currency-codes-for-the-transactions-table","title":"Example: Populate random currency codes for the transactions table","text":"
This example outlines configuring the RandomCurrency transformer to populate the currency_code column in a transactions table with random currency codes. It is an effective way to simulate international transactions across multiple currencies.
In this configuration, the currency_code column will be updated with random currency codes for each entry, replacing any existing non-NULL values. If the keep_null parameter is set to true, existing NULL values in the column will be preserved, ensuring the integrity of records where currency data may not be applicable.
"},{"location":"built_in_transformers/standard_transformers/random_date/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column Name of the column to be affected Yes date, timestamp, timestamptz min The minimum threshold date for the random value. The format depends on the column type. Yes - max The maximum threshold date for the random value. The format depends on the column type. Yes - truncate Truncate the date to the specified part (nanosecond, microsecond, millisecond, second, minute, hour, day, month, year). The truncate operation is not applied by default. No - keep_null Indicates whether NULL values should be replaced with transformed values or not true No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_date/#dynamic-parameters","title":"Dynamic parameters","text":"Parameter Supported types min date, timestamp, timestamptz max date, timestamp, timestamptz"},{"location":"built_in_transformers/standard_transformers/random_date/#description","title":"Description","text":"
The RandomDate transformer generates a random date within the provided interval, starting from min to max. It can also perform date truncation up to the specified part of the date. The format of dates in the min and max parameters must adhere to PostgreSQL types, including DATE, TIMESTAMP WITHOUT TIMEZONE, or TIMESTAMP WITH TIMEZONE.
Note
The value of min and max parameters depends on the column type. For example, for the date column, the value should be in the format YYYY-MM-DD, while for the timestamp column, the value should be in the format YYYY-MM-DD HH:MM:SS or YYYY-MM-DD HH:MM:SS.SSSSSS. The timestamptz column requires the value to be in the format YYYY-MM-DD HH:MM:SS.SSSSSS+HH:MM. Read more about date/time formats in the PostgreSQL documentation.
The behaviour for NULL values can be configured using the keep_null parameter. The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
In the following example, a random timestamp without timezone is generated for the modifieddate column within the range from 2011-05-31 00:00:00 to 2013-05-31 00:00:00, and the part of the random value after day is truncated.
ColumnOriginalValueTransformedValue modifieddate2014-06-30 00:00:002012-07-27 00:00:00"},{"location":"built_in_transformers/standard_transformers/random_date/#example-generate-hiredate-based-on-birthdate-using-two-transformations","title":"Example: Generate hiredate based on birthdate using two transformations","text":"
In this example, the RandomDate transformer generates a random date for the birthdate column within the range now - 50 years to now - 18 years. The hire date is generated based on the birthdate, ensuring that the employee is at least 18 years old when hired.
The RandomDayOfMonth transformer is designed to populate specified database columns with random day-of-the-month values. It is particularly useful for scenarios requiring the simulation of dates, such as generating random event dates, user sign-up dates, or any situation where the specific day of the month is needed without reference to the actual month or year.
"},{"location":"built_in_transformers/standard_transformers/random_day_of_month/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar, int2, int4, int8, numeric keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_day_of_month/#description","title":"Description","text":"
Utilizing the faker library, the RandomDayOfMonth transformer generates random numerical values representing days of the month, ranging from 1 to 31. This allows for the easy insertion of random but plausible day-of-the-month data into a database, enhancing realism or anonymizing actual dates.
"},{"location":"built_in_transformers/standard_transformers/random_day_of_month/#example-populate-random-days-of-the-month-for-the-events-table","title":"Example: Populate random days of the month for the events table","text":"
This example illustrates how to configure the RandomDayOfMonth transformer to fill the event_day column in the events table with random day-of-the-month values, facilitating the simulation of varied event scheduling.
With this setup, the event_day column will be updated with random day-of-the-month values, replacing any existing non-NULL values. Setting keep_null to true ensures that NULL values in the column are left unchanged, maintaining any existing gaps in the data.
The RandomDayOfWeek transformer is specifically designed to fill specified database columns with random day-of-the-week names. It is particularly useful for applications that require simulated weekly schedules, random event planning, or any scenario where the day of the week is relevant but the specific date is not.
"},{"location":"built_in_transformers/standard_transformers/random_day_of_week/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_day_of_week/#description","title":"Description","text":"
Utilizing the faker library, the RandomDayOfWeek transformer generates names of days (e. g., Monday, Tuesday) at random. This transformer can be applied to any text or varchar column in a database, introducing variability and realism into data sets that need to represent days of the week in a non-specific manner.
"},{"location":"built_in_transformers/standard_transformers/random_day_of_week/#example-populate-random-days-of-the-week-for-the-work_schedule-table","title":"Example: Populate random days of the week for the work_schedule table","text":"
This example demonstrates configuring the RandomDayOfWeek transformer to populate the work_day column in the work_schedule table with random days of the week. This setup can help simulate a diverse range of work schedules without tying them to specific dates.
In this configuration, every entry in the work_day column will be updated with a random day of the week, replacing any existing non-NULL values. If the keep_null parameter is set to true, then existing NULL values within the column will remain unchanged.
The RandomDomainName transformer is designed to populate specified database columns with random domain names. This tool is invaluable for simulating web data, testing applications that interact with domain names, or anonymizing real domain information in datasets.
"},{"location":"built_in_transformers/standard_transformers/random_domain_name/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_domain_name/#description","title":"Description","text":"
By leveraging an algorithm or library capable of generating believable domain names, the RandomDomainName transformer introduces random domain names into the specified database column. Each generated domain name includes a second-level domain (SLD) and a top-level domain (TLD), such as \"example.com\" or \"website.org,\" providing a wide range of plausible web addresses for database enrichment.
"},{"location":"built_in_transformers/standard_transformers/random_domain_name/#example-populate-random-domain-names-for-the-websites-table","title":"Example: Populate random domain names for the websites table","text":"
This example demonstrates configuring the RandomDomainName transformer to populate the domain column in the websites table with random domain names. This approach facilitates the creation of a diverse and realistic set of web addresses for testing, simulation, or data anonymization purposes.
In this setup, the domain column will be updated with random domain names for each entry, replacing any existing non-NULL values. If keep_null is set to true, the transformer will preserve existing NULL values in the column, maintaining the integrity of data where domain information is not applicable.
The RandomE164PhoneNumber transformer is developed to populate specified database columns with random E.164 phone numbers. This tool is essential for applications requiring the simulation of contact information, testing phone number validation systems, or anonymizing phone number data in datasets while focusing on E.164 numbers.
"},{"location":"built_in_transformers/standard_transformers/random_e164_phone_number/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_e164_phone_number/#description","title":"Description","text":"
The RandomE164PhoneNumber transformer utilizes algorithms capable of generating random E.164 phone numbers with the standard international format and injects them into the designated database column. This feature allows for the creation of diverse and realistic contact information in datasets for development, testing, or data anonymization purposes.
"},{"location":"built_in_transformers/standard_transformers/random_e164_phone_number/#example-populate-random-e164-phone-numbers-for-the-contact_information-table","title":"Example: Populate random E.164 phone numbers for the contact_information table","text":"
This example demonstrates configuring the RandomE164PhoneNumber transformer to populate the phone_number column in the contact_information table with random E.164 phone numbers. It is an effective method for simulating a variety of contact information entries with E.164 numbers.
In this configuration, the phone_number column will be updated with random E.164 phone numbers for each contact information entry, replacing any existing non-NULL values. If the keep_null parameter is set to true, existing NULL values in the column will be preserved, ensuring the integrity of records where E.164 phone number information is not applicable or provided.
"},{"location":"built_in_transformers/standard_transformers/random_email/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_original_domain Keep original of the original address false No - local_part_template The template for local part of email No - domain_part_template The template for domain part of email No - domains List of domains for new email [\"gmail.com\", \"yahoo.com\", \"outlook.com\", \"hotmail.com\", \"aol.com\", \"icloud.com\", \"mail.com\", \"zoho.com\", \"yandex.com\", \"protonmail.com\", \"gmx.com\", \"fastmail.com\"] No - validate Validate generated email if using template false No - max_random_length Max length of randomly generated part of the email 32 No - keep_null Indicates whether NULL values should be preserved false No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_email/#description","title":"Description","text":"
The RandomEmail transformer generates random email addresses for the specified database column. By default, the transformer generates random email addresses with a maximum length of 32 characters. The keep_original_domain parameter allows you to preserve the original domain part of the email address. The local_part_template and domain_part_template parameters enable you to specify templates for the local and domain parts of the email address, respectively. If the validate parameter is set to true, the transformer will validate the generated email addresses against the specified templates. The keep_null parameter allows you to preserve existing NULL values in the column.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
In each template you have access to the columns of the table by using the {{ .column_name }} syntax. Note that all values are strings. For example, you can use for assembling the email address by accessing to first_name and last_name columns {{ .first_name | lower }}.{{ .last_name | lower }}.
The transformer always generates random sequences for the email, and you can use it by accessing the {{ .random_string }} variable. For example, we can add random string in the end of local part {{ .first_name | lower }}.{{ .last_name | lower }}.{{ .random_string }}.
Read more about template function Template functions.
"},{"location":"built_in_transformers/standard_transformers/random_email/#random-email-generation-using-first-name-and-last-name","title":"Random email generation using first name and last name","text":"
In this example, the RandomEmail transformer generates random email addresses for the email column in the account table. The transformer generates email addresses using the first_name and last_name columns as the local part of the email address and adds a random string to the end of the local part with length 10 characters. The original domain part of the email address is preserved.
CREATE TABLE account\n(\n id SERIAL PRIMARY KEY,\n gender VARCHAR(1) NOT NULL,\n email TEXT NOT NULL NOT NULL UNIQUE,\n first_name TEXT NOT NULL,\n last_name TEXT NOT NULL,\n birth_date DATE,\n created_at TIMESTAMP NOT NULL DEFAULT NOW()\n);\n\nINSERT INTO account (first_name, gender, last_name, birth_date, email)\nVALUES ('John', 'M', 'Smith', '1980-01-01', 'john.smith@gmail.com');\n
ColumnOriginalValueTransformedValue emailjohn.smith@gmail.comjohn.smith.a075d99e2d@gmail.com"},{"location":"built_in_transformers/standard_transformers/random_email/#simple-random-email-generation","title":"Simple random email generation","text":"
In this example, the RandomEmail transformer generates random email addresses for the email column in the account table. The transformer generates random email addresses with a maximum length of 10 characters.
Generate a random float within the provided interval.
"},{"location":"built_in_transformers/standard_transformers/random_float/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes float4, float8 min The minimum threshold for the random value. The value range depends on the column type. Yes - max The maximum threshold for the random value. The value range depends on the column type. Yes - decimal The decimal of the random float value (number of digits after the decimal point) 4 No - keep_null Indicates whether NULL values should be replaced with transformed values or not true No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_float/#dynamic-parameters","title":"Dynamic parameters","text":"Parameter Supported types min float4, float8 max float4, float8"},{"location":"built_in_transformers/standard_transformers/random_float/#description","title":"Description","text":"
The RandomFloat transformer generates a random float value within the provided interval, starting from min to max, with the option to specify the number of decimal digits by using the decimal parameter. The behaviour for NULL values can be configured using the keep_null parameter.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/random_float/#example-generate-random-price","title":"Example: Generate random price","text":"
In this example, the RandomFloat transformer generates random prices in the range from 0.1 to 7000 while maintaining a decimal of up to 2 digits.
Generate a random integer within the provided interval.
"},{"location":"built_in_transformers/standard_transformers/random_int/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes int2, int4, int8 min The minimum threshold for the random value Yes - max The maximum threshold for the random value Yes - keep_null Indicates whether NULL values should be replaced with transformed values or not true No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_int/#dynamic-parameters","title":"Dynamic parameters","text":"Parameter Supported types min int2, int4, int8 max int2, int4, int8"},{"location":"built_in_transformers/standard_transformers/random_int/#description","title":"Description","text":"
The RandomInt transformer generates a random integer within the specified min and max thresholds. The behaviour for NULL values can be configured using the keep_null parameter.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/random_int/#example-generate-random-item-quantity","title":"Example: Generate random item quantity","text":"
In the following example, the RandomInt transformer generates a random value in the range from 1 to 30 and assigns it to the orderqty column.
generate random orderqty in the range from 1 to 30
ColumnOriginalValueTransformedValue orderqty129"},{"location":"built_in_transformers/standard_transformers/random_int/#example-generate-random-sick-leave-hours-based-on-vacation-hours","title":"Example: Generate random sick leave hours based on vacation hours","text":"
In the following example, the RandomInt transformer generates a random value in the range from 1 to the value of the vacationhours column and assigns it to the sickleavehours column. This configuration allows for the simulation of sick leave hours based on the number of vacation hours.
The RandomIp transformer is designed to populate specified database columns with random IP v4 or V6 addresses. This utility is essential for applications requiring the simulation of network data, testing systems that utilize IP addresses, or anonymizing real IP addresses in datasets.
"},{"location":"built_in_transformers/standard_transformers/random_ip/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar, inet subnet Subnet for generating random ip in V4 or V6 format Yes - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_ip/#dynamic-parameters","title":"Dynamic parameters","text":"Name Supported types subnet cidr, text, varchar"},{"location":"built_in_transformers/standard_transformers/random_ip/#description","title":"Description","text":"
Utilizing a robust algorithm or library for generating IP addresses, the RandomIp transformer injects random IPv4 or IPv6 addresses into the designated database column, depending on the provided subnet. The transformer automatically detects whether to generate an IPv4 or IPv6 address based on the subnet version specified.
"},{"location":"built_in_transformers/standard_transformers/random_ip/#example-generate-a-random-ipv4-address-for-a-1921681024-subnet","title":"Example: Generate a Random IPv4 Address for a 192.168.1.0/24 Subnet","text":"
This example demonstrates how to configure the RandomIp transformer to inject a random IPv4 address into the ip_address column for entries in the 192.168.1.0/24 subnet:
Create table ip_networks and insert data
CREATE TABLE ip_networks\n(\n id SERIAL PRIMARY KEY,\n ip_address INET,\n network CIDR\n);\n\nINSERT INTO ip_networks (ip_address, network)\nVALUES ('192.168.1.10', '192.168.1.0/24'),\n ('10.0.0.5', '10.0.0.0/16'),\n ('172.16.254.3', '172.16.0.0/12'),\n ('192.168.100.14', '192.168.100.0/24'),\n ('2001:0db8:85a3:0000:0000:8a2e:0370:7334', '2001:0db8:85a3::/64'); -- An IPv6 address and network\n
ColumnOriginalValueTransformedValue ip_address192.168.1.10192.168.1.28"},{"location":"built_in_transformers/standard_transformers/random_ip/#example-generate-a-random-ip-based-on-the-dynamic-subnet-parameter","title":"Example: Generate a Random IP Based on the Dynamic Subnet Parameter","text":"
This configuration illustrates how to use the RandomIp transformer dynamically, where it reads the subnet information from the network column of the database and generates a corresponding random IP address:
RandomPerson transformer example with dynamic mode
The RandomLatitude transformer generates random latitude values for specified database columns. It is designed to support geographical data enhancements, particularly useful for applications requiring randomized but plausible geographical coordinates.
"},{"location":"built_in_transformers/standard_transformers/random_latitude/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes float4, float8, numeric keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_latitude/#description","title":"Description","text":"
The RandomLatitude transformer utilizes the faker library to produce random latitude values within the range of -90 to +90 degrees. This transformer can be applied to columns designated to store geographical latitude information, enhancing data sets with randomized latitude coordinates.
"},{"location":"built_in_transformers/standard_transformers/random_latitude/#example-populate-random-latitude-for-the-locations-table","title":"Example: Populate random latitude for the locations table","text":"
This example demonstrates configuring the RandomLatitude transformer to populate the latitude column in the locations table with random latitude values.
With this configuration, the latitude column will be filled with random latitude values, replacing any existing non-NULL values. If keep_null is set to true, existing NULL values will be preserved.
The RandomLongitude transformer is designed to generate random longitude values for specified database columns, enhancing datasets with realistic geographic coordinates suitable for a wide range of applications, from testing location-based services to anonymizing real geographic data.
"},{"location":"built_in_transformers/standard_transformers/random_longitude/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes float4, float8, numeric keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_longitude/#description","title":"Description","text":"
The RandomLongitude transformer leverages the faker library to produce random longitude values within the globally accepted range of -180 to +180 degrees. This flexibility allows the transformer to be applied to any column intended for storing longitude data, providing a simple yet powerful tool for introducing randomized longitude coordinates into a database.
"},{"location":"built_in_transformers/standard_transformers/random_longitude/#example-populate-random-longitude-for-the-locations-table","title":"Example: Populate random longitude for the locations table","text":"
This example shows how to use the RandomLongitude transformer to fill the longitude column in the locations table with random longitude values.
This setup ensures that all entries in the longitude column receive a random longitude value, replacing any existing non-NULL values. If keep_null is set to true, then existing NULL values in the column will remain unchanged.
The RandomMac transformer is designed to populate specified database columns with random MAC addresses.
"},{"location":"built_in_transformers/standard_transformers/random_mac/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar, macaddr keep_original_vendor Should the Individual/Group (I/G) and Universal/Local (U/L) bits be preserved from the original MAC address. false No - cast_type Param which allow to set Individual/Group (I/G) bit in MAC Address. Allowed values [any, individual, group]. If this value is individual, the address is meant for a single device (unicast). If it is group, the address is for a group of devices, which can include multicast and broadcast addresses. any No management_type Param which allow to set Universal/Local (U/L) bit in MAC Address. Allowed values [any, universal, local]. If this bit is universal, the address is universally administered (globally unique). If it is local, the address is locally administered (such as when set manually or programmatically on a network device). any No engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_mac/#description","title":"Description","text":"
The RandomMac transformer generates a random MAC address and injects it into the specified database column. The transformer can be configured to preserve the Individual/Group (I/G) and Universal/Local (U/L) bits from the original MAC address. You can also keep the original vendor bits in the generated MAC address by setting the keep_original_vendor parameter to true.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/random_mac/#example-generate-a-random-mac-address","title":"Example: Generate a Random MAC Address","text":"
This example demonstrates how to configure the RandomMac transformer to inject a random MAC address into the mac_address column:
Create table mac_addresses and insert data
CREATE TABLE mac_addresses\n(\n id SERIAL PRIMARY KEY,\n device_name VARCHAR(50),\n mac_address MACADDR,\n description TEXT\n);\n\nINSERT INTO mac_addresses (device_name, mac_address, description)\nVALUES ('Device A', '00:1A:2B:3C:4D:5E', 'Description for Device A'),\n ('Device B', '01:2B:3C:4D:5E:6F', 'Description for Device B'),\n ('Device C', '02:3C:4D:5E:6F:70', 'Description for Device C'),\n ('Device D', '03:4D:5E:6F:70:71', 'Description for Device D'),\n ('Device E', '04:5E:6F:70:71:72', 'Description for Device E');\n
The RandomMonthName transformer is crafted to populate specified database columns with random month names. This transformer is especially useful for scenarios requiring the simulation of time-related data, such as user birth months or event months, without relying on specific date values.
"},{"location":"built_in_transformers/standard_transformers/random_month_name/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_month_name/#description","title":"Description","text":"
The RandomMonthName transformer utilizes the faker library to generate the names of months at random. It can be applied to any textual column in a database to introduce variety and realism into data sets that require representations of months without the need for specific calendar dates.
"},{"location":"built_in_transformers/standard_transformers/random_month_name/#example-populate-random-month-names-for-the-user_profiles-table","title":"Example: Populate random month names for the user_profiles table","text":"
This example demonstrates how to configure the RandomMonthName transformer to fill the birth_month column in the user_profiles table with random month names, adding a layer of diversity to user data without using actual birthdates.
With this setup, the birth_month column will be updated with random month names, replacing any existing non-NULL values. If the keep_null parameter is set to true, then existing NULL values within the column will remain untouched.
Generate a random numeric within the provided interval.
"},{"location":"built_in_transformers/standard_transformers/random_numeric/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes numeric, decimal min The minimum threshold for the random value. The value range depends on the column type. Yes - max The maximum threshold for the random value. The value range depends on the column type. Yes - decimal The decimal of the random numeric value (number of digits after the decimal point) 4 No - keep_null Indicates whether NULL values should be replaced with transformed values or not true No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_numeric/#dynamic-parameters","title":"Dynamic parameters","text":"Parameter Supported types min int2, int4, int8, float4, float8, numeric, decimal max int2, int4, int8, float4, float8, numeric, decimal"},{"location":"built_in_transformers/standard_transformers/random_numeric/#description","title":"Description","text":"
The RandomNumeric transformer generates a random numeric value within the provided interval, starting from min to max, with the option to specify the number of decimal digits by using the decimal parameter. The behaviour for NULL values can be configured using the keep_null parameter.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/random_numeric/#example-generate-random-price","title":"Example: Generate random price","text":"
In this example, the RandomNumeric transformer generates random prices in the range from 0.1 to 7000 while maintaining a decimal of up to 2 digits.
The RandomParagraph transformer is crafted to populate specified database columns with random paragraphs. This utility is indispensable for applications that require the generation of extensive textual content, such as simulating articles, enhancing textual datasets for NLP systems, or anonymizing textual content in databases.
"},{"location":"built_in_transformers/standard_transformers/random_paragraph/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_paragraph/#description","title":"Description","text":"
Employing sophisticated text generation algorithms or libraries, the RandomParagraph transformer generates random paragraphs, injecting them into the designated database column. This transformer is designed to create varied and plausible paragraphs that simulate real-world textual content, providing a valuable tool for database enrichment, testing, and anonymization.
"},{"location":"built_in_transformers/standard_transformers/random_paragraph/#example-populate-random-paragraphs-for-the-articles-table","title":"Example: Populate random paragraphs for the articles table","text":"
This example illustrates configuring the RandomParagraph transformer to populate the body column in an articles table with random paragraphs. It is an effective way to simulate diverse article content for development, testing, or demonstration purposes.
With this setup, the body column will receive random paragraphs for each entry, replacing any existing non-NULL values. Setting the keep_null parameter to true allows for the preservation of existing NULL values within the column, maintaining the integrity of records where article content is not applicable or provided.
The RandomPassword transformer is designed to populate specified database columns with random passwords. This utility is vital for applications that require the simulation of secure user data, testing systems with authentication mechanisms, or anonymizing real passwords in datasets.
"},{"location":"built_in_transformers/standard_transformers/random_password/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_password/#description","title":"Description","text":"
Employing sophisticated password generation algorithms or libraries, the RandomPassword transformer injects random passwords into the designated database column. This feature is particularly useful for creating realistic and secure user password datasets for development, testing, or demonstration purposes.
"},{"location":"built_in_transformers/standard_transformers/random_password/#example-populate-random-passwords-for-the-user_accounts-table","title":"Example: Populate random passwords for the user_accounts table","text":"
This example demonstrates how to configure the RandomPassword transformer to populate the password column in the user_accounts table with random passwords.
In this configuration, every entry in the password column will be updated with a random password. Setting the keep_null parameter to true will preserve existing NULL values in the column, accommodating scenarios where password data may not be applicable.
The RandomPerson transformer is designed to populate specified database columns with personal attributes such as first name, last name, title and gender.
"},{"location":"built_in_transformers/standard_transformers/random_person/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types columns The name of the column to be affected Yes text, varchar gender set specific gender (possible values: Male, Female, Any) Any No - gender_mapping Specify gender name to possible values when using dynamic mode in \"gender\" parameter Any No - fallback_gender Specify fallback gender if not mapped when using dynamic mode in \"gender\" parameter Any No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_person/#description","title":"Description","text":"
The RandomPerson transformer utilizes a comprehensive list of first names to inject random first names into the designated database column. This feature allows for the creation of diverse and realistic user profiles by simulating a variety of first names without using real user data.
name \u2014 the name of the column where the personal attributes will be stored. This value is required.
template - the template for the column value. You can use the next attributes: .FirstName, .LastName or .Title. For example, if you want to generate a full name, you can use the next template: \"{{ .FirstName }} {{ .LastName }}\"
hashing - the bool value. Indicates whether the column value must be passed through the hashing function. The default value is false. If all column has hashing set to false (by default), then all columns will be hashed.
keep_null - the bool value. Indicates whether NULL values should be preserved. The default value is true
Gender that will be used if gender_mapping was not found. This parameter is optional and required only for gender parameter in dynamic mode. The default value is Any.
"},{"location":"built_in_transformers/standard_transformers/random_person/#example-populate-random-first-name-and-last-name-for-table-user_profiles-in-static-mode","title":"Example: Populate random first name and last name for table user_profiles in static mode","text":"
This example demonstrates how to use the RandomPerson transformer to populate the name and surname columns in the user_profiles table with random first names, last name, respectively.
Create table user_profiles and insert data
CREATE TABLE personal_data\n(\n id SERIAL PRIMARY KEY,\n name VARCHAR(100),\n surname VARCHAR(100),\n sex CHAR(1) CHECK (sex IN ('M', 'F'))\n);\n\n-- Insert sample data into the table\nINSERT INTO personal_data (name, surname, sex)\nVALUES ('John', 'Doe', 'M'),\n ('Jane', 'Smith', 'F'),\n ('Alice', 'Johnson', 'F'),\n ('Bob', 'Lee', 'M');\n
ColumnOriginalValueTransformedValue nameJohnZane surnameDoeMcCullough"},{"location":"built_in_transformers/standard_transformers/random_person/#example-populate-random-first-name-and-last-name-for-table-user_profiles-in-dynamic-mode","title":"Example: Populate random first name and last name for table user_profiles in dynamic mode","text":"
This example demonstrates how to use the RandomPerson transformer to populate the name, surname using dynamic gender
RandomPerson transformer example with dynamic mode
The RandomPhoneNumber transformer is developed to populate specified database columns with random phone numbers. This tool is essential for applications requiring the simulation of contact information, testing phone number validation systems, or anonymizing phone number data in datasets.
"},{"location":"built_in_transformers/standard_transformers/random_phone_number/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_phone_number/#description","title":"Description","text":"
The RandomPhoneNumber transformer utilizes algorithms capable of generating random phone numbers with various formats and injects them into the designated database column. This feature allows for the creation of diverse and realistic contact information in datasets for development, testing, or data anonymization purposes.
"},{"location":"built_in_transformers/standard_transformers/random_phone_number/#example-populate-random-phone-numbers-for-the-contact_information-table","title":"Example: Populate random phone numbers for the contact_information table","text":"
This example demonstrates configuring the RandomPhoneNumber transformer to populate the phone_number column in the contact_information table with random phone numbers. It is an effective method for simulating a variety of contact information entries.
In this configuration, the phone_number column will be updated with random phone numbers for each contact information entry, replacing any existing non-NULL values. If the keep_null parameter is set to true, existing NULL values in the column will be preserved, ensuring the integrity of records where phone number information is not applicable or provided.
The RandomSentence transformer is designed to populate specified database columns with random sentences. Ideal for simulating natural language text for user comments, testing NLP systems, or anonymizing textual data in databases.
"},{"location":"built_in_transformers/standard_transformers/random_sentence/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_sentence/#description","title":"Description","text":"
The RandomSentence transformer employs complex text generation algorithms or libraries to generate random sentences, injecting them into a designated database column without the need for specifying sentence length. This flexibility ensures the creation of varied and plausible text for a wide range of applications.
"},{"location":"built_in_transformers/standard_transformers/random_sentence/#example-populate-random-sentences-for-the-comments-table","title":"Example: Populate random sentences for the comments table","text":"
This example shows how to configure the RandomSentence transformer to populate the comment column in the comments table with random sentences. It is a straightforward method for simulating diverse user-generated content.
In this configuration, the comment column will be updated with random sentences for each entry, replacing any existing non-NULL values. If keep_null is set to true, existing NULL values in the column will be preserved, maintaining the integrity of records where comments are not applicable.
Generate a random string using the provided characters within the specified length range.
"},{"location":"built_in_transformers/standard_transformers/random_string/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar min_length The minimum length of the generated string Yes - max_length The maximum length of the generated string Yes - symbols The range of characters that can be used in the random string abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ No - keep_null Indicates whether NULL values should be replaced with transformed values or not true No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_string/#description","title":"Description","text":"
The RandomString transformer generates a random string with a length between min_length and max_length using the characters specified in the symbols string as the possible set of characters. The behaviour for NULL values can be configured using the keep_null parameter.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/random_string/#example-generate-a-random-string-for-accountnumber","title":"Example: Generate a random string for accountnumber","text":"
In the following example, a random string is generated for the accountnumber column with a length range from 9 to 12. The character set used for generation includes 1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ.
The RandomTimezone transformer is designed to populate specified database columns with random timezone strings. This transformer is particularly useful for applications that require the simulation of global user data, testing of timezone-related functionalities, or anonymizing real user timezone information in datasets.
"},{"location":"built_in_transformers/standard_transformers/random_timezone/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_timezone/#description","title":"Description","text":"
Utilizing a comprehensive library or algorithm for generating timezone data, the RandomTimezone transformer provides random timezone strings (e. g., \"America/New_York\", \"Europe/London\") for database columns. This feature enables the creation of diverse and realistic datasets by simulating timezone information for user profiles, event timings, or any other data requiring timezone context.
"},{"location":"built_in_transformers/standard_transformers/random_timezone/#example-populate-random-timezone-strings-for-the-user_accounts-table","title":"Example: Populate random timezone strings for the user_accounts table","text":"
This example demonstrates how to configure the RandomTimezone transformer to populate the timezone column in the user_accounts table with random timezone strings, enhancing the dataset with varied global user representations.
With this configuration, every entry in the timezone column will be updated with a random timezone string, replacing any existing non-NULL values. If the keep_null parameter is set to true, existing NULL values within the column will remain unchanged, preserving the integrity of rows without specified timezone data.
The RandomTollFreePhoneNumber transformer is designed to populate specified database columns with random toll-free phone numbers. This tool is essential for applications requiring the simulation of contact information, testing phone number validation systems, or anonymizing phone number data in datasets while focusing on toll-free numbers.
"},{"location":"built_in_transformers/standard_transformers/random_toll_free_phone_number/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_toll_free_phone_number/#description","title":"Description","text":"
The RandomTollFreePhoneNumber transformer utilizes algorithms capable of generating random toll-free phone numbers with various formats and injects them into the designated database column. This feature allows for the creation of diverse and realistic toll-free contact information in datasets for development, testing, or data anonymization purposes.
"},{"location":"built_in_transformers/standard_transformers/random_toll_free_phone_number/#example-populate-random-toll-free-phone-numbers-for-the-contact_information-table","title":"Example: Populate random toll-free phone numbers for the contact_information table","text":"
This example demonstrates configuring the RandomTollFreePhoneNumber transformer to populate the phone_number column in the contact_information table with random toll-free phone numbers. It is an effective method for simulating a variety of contact information entries with toll-free numbers.
In this configuration, the phone_number column will be updated with random toll-free phone numbers for each contact information entry, replacing any existing non-NULL values. If the keep_null parameter is set to true, existing NULL values in the column will be preserved, ensuring the integrity of records where toll-free phone number information is not applicable or provided.
The RandomUnixTimestamp transformer generates random Unix time values (timestamps) for specified database columns. It is particularly useful for populating columns with timestamp data, simulating time-related data, or anonymizing actual timestamps in a dataset.
"},{"location":"built_in_transformers/standard_transformers/random_unix_timestamp/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes int2, int4, int8 min The minimum threshold date for the random value in unix timestamp format (integer) with sec unit by default Yes - max The maximum threshold date for the random value in unix timestamp format (integer) with sec unit by default Yes - unit Generated unix timestamp value unit. Possible values [second, millisecond, microsecond, nanosecond] second Yes - min_unit Min unix timestamp threshold date unit. Possible values [second, millisecond, microsecond, nanosecond] second Yes - max_unit Min unix timestamp threshold date unit. Possible values [second, millisecond, microsecond, nanosecond] second Yes - keep_null Indicates whether NULL values should be preserved false No - truncate Truncate the date to the specified part (nanosecond, microsecond, millisecond, second, minute, hour, day, month, year). The truncate operation is not applied by default. No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_unix_timestamp/#description","title":"Description","text":"
The RandomUnixTimestamp transformer generates random Unix timestamps within the provided interval, starting from min to max. The min and max parameters are expected to be in Unix timestamp format. The min_unit and max_unit parameters specify the unit of the Unix timestamp threshold date. The truncate parameter allows you to truncate the date to the specified part of the date. The keep_null parameter allows you to specify whether NULL values should be preserved or replaced with transformed values.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/random_unix_timestamp/#example-generate-random-unix-timestamps-with-dynamic-parameters","title":"Example: Generate random Unix timestamps with dynamic parameters","text":"
In this example, the RandomUnixTimestamp transformer generates random Unix timestamps using dynamic parameters. The min parameter is set to the created_at column, which is converted to Unix seconds using the TimestampToUnixSec. The max parameter is set to a fixed value. The paid_at column is populated with random Unix timestamps in the range from created_at to 1715934239 (Unix timestamp for 2024-05-17 12:03:59). The unit parameter is set to millisecond because the paid_at column stores timestamps in milliseconds.
CREATE TABLE transactions\n(\n id SERIAL PRIMARY KEY,\n kind VARCHAR(255),\n total DECIMAL(10, 2),\n created_at TIMESTAMP,\n paid_at BIGINT -- stores milliseconds since the epoch\n);\n\n-- Inserting data with milliseconds timestamp\nINSERT INTO transactions (kind, total, created_at, paid_at)\nVALUES ('Sale', 199.99, '2023-05-17 12:00:00', (EXTRACT(EPOCH FROM TIMESTAMP '2023-05-17 12:05:00') * 1000)),\n ('Refund', 50.00, '2023-05-18 15:00:00', (EXTRACT(EPOCH FROM TIMESTAMP '2023-05-18 15:10:00') * 1000)),\n ('Sale', 129.99, '2023-05-19 10:30:00', (EXTRACT(EPOCH FROM TIMESTAMP '2023-05-19 10:35:00') * 1000));\n
ColumnOriginalValueTransformedValue paid_at16843251000001708919030732"},{"location":"built_in_transformers/standard_transformers/random_unix_timestamp/#example-generate-simple-random-unix-timestamps","title":"Example: Generate simple random Unix timestamps","text":"
In this example, the RandomUnixTimestamp transformer generates random Unix timestamps for the paid_at column in the range from 1615934239 (Unix timestamp for 2021-03-16 12:03:59) to 1715934239 (Unix timestamp for 2024-05-17 12:03:59). The unit parameter is set to millisecond because the paid_at column stores timestamps in milliseconds.
The RandomURL transformer is designed to populate specified database columns with random URL (Uniform Resource Locator) addresses. This tool is highly beneficial for simulating web content, testing applications that require URL input, or anonymizing real web addresses in datasets.
"},{"location":"built_in_transformers/standard_transformers/random_url/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_url/#description","title":"Description","text":"
Utilizing advanced algorithms or libraries for generating URL strings, the RandomURL transformer injects random, plausible URLs into the designated database column. Each generated URL is structured to include the protocol (e. g., \"http://\", \"https://\"), domain name, and path, offering a realistic range of web addresses for various applications.
"},{"location":"built_in_transformers/standard_transformers/random_url/#example-populate-random-urls-for-the-webpages-table","title":"Example: Populate random URLs for the webpages table","text":"
This example illustrates how to configure the RandomURL transformer to populate the page_url column in a webpages table with random URLs, providing a broad spectrum of web addresses for testing or data simulation purposes.
With this configuration, the page_url column will be filled with random URLs for each entry, replacing any existing non-NULL values. Setting the keep_null parameter to true allows for the preservation of existing NULL values within the column, accommodating scenarios where URL data may be intentionally omitted.
The RandomUsername transformer is crafted to populate specified database columns with random usernames. This utility is crucial for applications that require the simulation of user data, testing systems with user login functionality, or anonymizing real usernames in datasets.
"},{"location":"built_in_transformers/standard_transformers/random_username/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_username/#description","title":"Description","text":"
By employing sophisticated algorithms or libraries capable of generating believable usernames, the RandomUsername transformer introduces random usernames into the specified database column. Each generated username is designed to be unique and plausible, incorporating a mix of letters, numbers, and possibly special characters, depending on the generation logic used.
"},{"location":"built_in_transformers/standard_transformers/random_username/#example-populate-random-usernames-for-the-user_accounts-table","title":"Example: Populate random usernames for the user_accounts table","text":"
This example demonstrates configuring the RandomUsername transformer to populate the username column in a user_accounts table with random usernames. This setup is ideal for creating a diverse and realistic user base for development, testing, or demonstration purposes.
In this configuration, every entry in the username column will be updated with a random username, replacing any existing non-NULL values. If the keep_null parameter is set to true, then the transformer will preserve existing NULL values within the column, maintaining data integrity where usernames are not applicable or available.
"},{"location":"built_in_transformers/standard_transformers/random_uuid/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar, uuid keep_null Indicates whether NULL values should be replaced with transformed values or not true No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_uuid/#description","title":"Description","text":"
The RandomUuid transformer generates a random UUID. The behaviour for NULL values can be configured using the keep_null parameter.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/random_uuid/#example-updating-the-rowguid-column","title":"Example: Updating the rowguid column","text":"
The following example replaces original UUID values of the rowguid column to randomly generated ones.
The RandomWord transformer populates specified database columns with random words. Ideal for simulating textual content, enhancing linguistic datasets, or anonymizing text in databases.
"},{"location":"built_in_transformers/standard_transformers/random_word/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_word/#description","title":"Description","text":"
The RandomWord transformer employs a mechanism to inject random words into a designated database column, supporting the generation of linguistically plausible and contextually diverse text. This transformer is particularly beneficial for creating rich text datasets for development, testing, or educational purposes without specifying the language, focusing on versatility and ease of use.
"},{"location":"built_in_transformers/standard_transformers/random_word/#example-populate-random-words-for-the-content-table","title":"Example: Populate random words for the content table","text":"
This example demonstrates configuring the RandomWord transformer to populate the tag column in the content table with random words. It is a straightforward approach to adding varied textual data for tagging or content categorization.
In this setup, the tag column will be updated with random words for each entry, replacing any existing non-NULL values. If keep_null is set to true, existing NULL values in the column will remain unchanged, maintaining data integrity for records where textual data is not applicable.
The RandomYearString transformer is designed to populate specified database columns with random year strings. It is ideal for scenarios that require the representation of years without specific dates, such as manufacturing years of products, birth years of users, or any other context where only the year is relevant.
"},{"location":"built_in_transformers/standard_transformers/random_year_string/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar, int2, int4, int8, numeric keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_year_string/#description","title":"Description","text":"
The RandomYearString transformer leverages the faker library to generate strings representing random years. This allows for the easy generation of year data in a string format, adding versatility and realism to datasets that need to simulate or anonymize year-related information.
"},{"location":"built_in_transformers/standard_transformers/random_year_string/#example-populate-random-year-strings-for-the-products-table","title":"Example: Populate random year strings for the products table","text":"
This example shows how to use the RandomYearString transformer to fill the manufacturing_year column in the products table with random year strings, simulating the diversity of manufacturing dates.
In this configuration, the manufacturing_year column will be populated with random year strings, replacing any existing non-NULL values. If keep_null is set to true, then existing NULL values in the column will be preserved.
Generates real addresses for specified database columns using the faker library. It supports customization of the generated address format through Go templates.
"},{"location":"built_in_transformers/standard_transformers/real_address/#parameters","title":"Parameters","text":"Name Properties Description Default Required Supported DB types columns Specifies the affected column names along with additional properties for each column Yes Various \u221f name The name of the column to be affected Yes string \u221f template A Go template string for formatting real address attributes Yes string \u221f keep_null Indicates whether NULL values should be preserved No bool"},{"location":"built_in_transformers/standard_transformers/real_address/#template-value-descriptions","title":"Template value descriptions","text":"
The template parameter allows for the injection of real address attributes into a customizable template. The following values can be included in your template:
{{.Address}} \u2014 street address or equivalent
{{.City}} \u2014 city name
{{.State}} \u2014 state, province, or equivalent region name
{{.PostalCode}} \u2014 postal or ZIP code
{{.Latitude}} \u2014 geographic latitude
{{.Longitude}} \u2014 geographic longitude
These placeholders can be combined and formatted as desired within the template string to generate custom address formats.
The RealAddress transformer uses the faker library to generate realistic addresses, which can then be formatted according to a specified template and applied to selected columns in a database. It allows for the generated addresses to replace existing values or to preserve NULL values, based on the transformer's configuration.
"},{"location":"built_in_transformers/standard_transformers/real_address/#example-generate-real-addresses-for-the-employee-table","title":"Example: Generate Real addresses for the employee table","text":"
This example shows how to configure the RealAddress transformer to generate real addresses for the address column in the employee table, using a custom format.
This configuration will generate real addresses with the format \"Street address, city, state postal code\" and apply them to the address column, replacing any existing non-NULL values.
"},{"location":"built_in_transformers/standard_transformers/regexp_replace/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar regexp The regular expression pattern to search for in the column's value Yes - replace The replacement value. This value may be replaced with a captured group from the regexp parameter. Yes -"},{"location":"built_in_transformers/standard_transformers/regexp_replace/#description","title":"Description","text":"
The RegexpReplace transformer replaces a string according to the applied regular expression. The valid regular expressions syntax is the same as the general syntax used by Perl, Python, and other languages. To be precise, it is the syntax accepted by RE2 and described in the Golang documentation, except for \\C.
"},{"location":"built_in_transformers/standard_transformers/regexp_replace/#example-removing-leading-prefix-from-loginid-column-value","title":"Example: Removing leading prefix from loginid column value","text":"
In the following example, the original values from loginid matching the adventure-works\\{{ id_name }} format are replaced with {{ id_name }}.
| column name | original value | transformed |\n|-------------|----------------------|-------------|\n| loginid | adventure-works\\ken0 | ken0 |\n
Note
YAML has control symbols, and using them without escaping may result in an error. In the example above, the prefix of id is separated by the \\ symbol. Since this symbol is a control symbol, we must escape it using \\\\. However, the '\\' symbol is also a control symbol for regular expressions, which is why we need to double-escape it as \\\\\\\\.
"},{"location":"built_in_transformers/standard_transformers/replace/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes any replace The value to replace Yes - keep_null Indicates whether NULL values should be replaced with transformed values or not true No - validate Performs a decoding procedure via the PostgreSQL driver using the column type to ensure that values have correct type true No -"},{"location":"built_in_transformers/standard_transformers/replace/#description","title":"Description","text":"
The Replace transformer replace an original value from the specified column with the provided one. It can optionally run a validation check with the validate parameter to ensure that the values are of a correct type before starting transformation. The behaviour for NULL values can be configured using the keep_null parameter.
"},{"location":"built_in_transformers/standard_transformers/replace/#example-updating-the-jobtitle-column","title":"Example: Updating the jobtitle column","text":"
In the following example, the provided value: \"programmer\" is first validated through driver decoding. If the current value of the jobtitle column is not NULL, it will be replaced with programmer. If the current value is NULL, it will remain NULL.
"},{"location":"built_in_transformers/standard_transformers/set_null/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes any"},{"location":"built_in_transformers/standard_transformers/set_null/#description","title":"Description","text":"
The SetNull transformer assigns NULL value to a column. This transformer generates warning if the affected column has NOT NULL constraint.
NULL constraint violation warning
{\n \"hash\": \"5a229ee964a4ba674a41a4d63dab5a8c\",\n \"meta\": {\n \"ColumnName\": \"jobtitle\",\n \"ConstraintType\": \"NotNull\",\n \"ParameterName\": \"column\",\n \"SchemaName\": \"humanresources\",\n \"TableName\": \"employee\",\n \"TransformerName\": \"SetNull\"\n },\n \"msg\": \"transformer may produce NULL values but column has NOT NULL constraint\",\n \"severity\": \"warning\"\n}\n
"},{"location":"built_in_transformers/standard_transformers/set_null/#example-set-null-value-to-updated_at-column","title":"Example: Set NULL value to updated_at column","text":"SetNull transformer example
| column name | original value | transformed |\n|-------------|-------------------------|-------------|\n| jobtitle | Chief Executive Officer | NULL |\n
"},{"location":"commands/","title":"Commands","text":""},{"location":"commands/#introduction","title":"Introduction","text":"Greenmask available commands
You can use the following commands within Greenmask:
list-transformers \u2014 displays a list of available transformers along with their documentation
show-transformer \u2014 displays information about the specified transformer
validate - performs a validation procedure by testing config, comparing transformed data, identifying potential issues, and checking for schema changes.
dump \u2014 initiates the data dumping process
restore \u2014 restores data to the target database either by specifying a dumpId or using the latest available dump
list-dumps \u2014 lists all available dumps stored in the system
show-dump \u2014 provides metadata information about a particular dump, offering insights into its structure and attributes
delete \u2014 deletes a specific dump from the storage
For any of the commands mentioned above, you can include the following common flags:
--log-format \u2014 specifies the desired format for log output, which can be either json or text. This parameter is optional, with the default format set to text.
--log-level \u2014 sets the desired level for log output, which can be one of debug, info, or error. This parameter is optional, with the default log level being info.
--config \u2014 requires the specification of a configuration file in YAML format. This configuration file is mandatory for Greenmask to operate correctly.
--help \u2014 displays comprehensive help information for Greenmask, providing guidance on its usage and available commands.
Usage:\n greenmask delete [flags] [dumpId]\n\nFlags:\n --before-date string delete dumps older than the specified date in RFC3339Nano format: 2021-01-01T00:00.0:00Z\n --dry-run do not delete anything, just show what would be deleted\n --prune-failed prune failed dumps\n --prune-unsafe prune dumps with \"unknown-or-failed\" statuses. Works only with --prune-failed\n --retain-for string retain dumps for the specified duration in format: 1w2d3h4m5s6ms7us8ns\n --retain-recent int retain the most recent N completed dumps (default -1)\n
Stores the transformed data in the specified storage location.
Note that the dump command shares the same parameters and environment variables as pg_dump, allowing you to configure the restoration process as needed.
Mostly it supports the same flags as the pg_dump utility, with some extra flags for Greenmask-specific features.
Supported flags
-b, --blobs include large objects in dump\n -c, --clean clean (drop) database objects before recreating\n -Z, --compress int compression level for compressed formats (default -1)\n -C, --create include commands to create database in dump\n -a, --data-only dump only the data, not the schema\n -d, --dbname string database to dump (default \"postgres\")\n --disable-dollar-quoting disable dollar quoting, use SQL standard quoting\n --enable-row-security enable row security (dump only content user has access to)\n -E, --encoding string dump the data in encoding ENCODING\n -N, --exclude-schema strings dump the specified schema(s) only\n -T, --exclude-table strings do NOT dump the specified table(s)\n --exclude-table-data strings do NOT dump data for the specified table(s)\n -e, --extension strings dump the specified extension(s) only\n --extra-float-digits string override default setting for extra_float_digits\n -f, --file string output file or directory name\n -h, --host string database server host or socket directory (default \"/var/run/postgres\")\n --if-exists use IF EXISTS when dropping objects\n --include-foreign-data strings use IF EXISTS when dropping objects\n -j, --jobs int use this many parallel jobs to dump (default 1)\n --load-via-partition-root load partitions via the root table\n --lock-wait-timeout int fail after waiting TIMEOUT for a table lock (default -1)\n -B, --no-blobs exclude large objects in dump\n --no-comments do not dump comments\n -O, --no-owner skip restoration of object ownership in plain-text format\n -X, --no-privileges do not dump privileges (grant/revoke)\n --no-publications do not dump publications\n --no-security-labels do not dump security label assignments\n --no-subscriptions do not dump subscriptions\n --no-sync do not wait for changes to be written safely to dis\n --no-synchronized-snapshots do not use synchronized snapshots in parallel jobs\n --no-tablespaces do not dump tablespace assignments\n --no-toast-compression do not dump TOAST compression methods\n --no-unlogged-table-data do not dump unlogged table data\n --pgzip use pgzip compression instead of gzip\n -p, --port int database server port number (default 5432)\n --quote-all-identifiers quote all identifiers, even if not key words\n -n, --schema strings dump the specified schema(s) only\n -s, --schema-only dump only the schema, no data\n --section string dump named section (pre-data, data, or post-data)\n --serializable-deferrable wait until the dump can run without anomalies\n --snapshot string use given snapshot for the dump\n --strict-names require table and/or schema include patterns to match at least one entity each\n -t, --table strings dump the specified table(s) only\n --test string connect as specified database user (default \"postgres\")\n --use-set-session-authorization use SET SESSION AUTHORIZATION commands instead of ALTER OWNER commands to set ownership\n -U, --username string connect as specified database user (default \"postgres\")\n -v, --verbose string verbose mode\n
By default, Greenmask uses gzip compression to restore data. In mist cases it is quite slow and does not utilize all available resources and is a bootleneck for IO operations. To speed up the restoration process, you can use the --pgzip flag to use pgzip compression instead of gzip. This method splits the data into blocks, which are compressed in parallel, making it ideal for handling large volumes of data. The output remains a standard gzip file.
The list-dumps command provides a list of all dumps stored in the storage. The list includes the following attributes:
ID \u2014 the unique identifier of the dump, used for operations like restore, delete, and show-dump
DATE \u2014 the date when the snapshot was created
DATABASE \u2014 the name of the database associated with the dump
SIZE \u2014 the original size of the dump
COMPRESSED SIZE \u2014 the size of the dump after compression
DURATION \u2014 the duration of the dump procedure
TRANSFORMED \u2014 indicates whether the dump has been transformed
STATUS \u2014 the status of the dump, which can be one of the following:
done \u2014 the dump was completed successfully
in progress \u2014 the dump is currently being created
failed \u2014 the dump creation process failed
unknown or failed \u2014 the deprecated status of the dump that is used for failed dumps or dumps in progress for version v0.1.14 and earlier
Example of list-dumps output:
Info
Greenmask uses a heartbeat mechanism to determine the status of a dump. A dump is considered failed if it lacks a \"done\" heartbeat or if the last heartbeat timestamp exceeds 30 minutes. Heartbeats are recorded every 15 minutes by the dump command while it is in progress. If greenmask fails unexpectedly, the heartbeat stops being updated, and after 30 minutes (twice the interval), the dump is classified as failed. The in progress status indicates that a dump is still ongoing.
The list-transformers command provides a list of all the allowed transformers, including both standard and advanced transformers. This list can be helpful for searching for an appropriate transformer for your data transformation needs.
To show a list of available transformers, use the following command:
greenmask --config=config.yml list-transformers\n
Supported flags:
--format \u2014 allows to select the output format. There are two options available: text or json. The default setting is text.
Example of list-transformers output:
When using the list-transformers command, you receive a list of available transformers with essential information about each of them. Below are the key parameters for each transformer:
NAME \u2014 the name of the transformer
DESCRIPTION \u2014 a brief description of what the transformer does
COLUMN PARAMETER NAME \u2014 name of a column or columns affected by transformation
SUPPORTED TYPES \u2014 list the supported value types
The JSON call greenmask --config=config.yml list-transformers --format=json has the same attributes:
JSON format output
[\n {\n \"name\": \"Cmd\",\n \"description\": \"Transform data via external program using stdin and stdout interaction\",\n \"parameters\": [\n {\n \"name\": \"columns\",\n \"supported_types\": [\n \"any\"\n ]\n }\n ]\n },\n {\n \"name\": \"Dict\",\n \"description\": \"Replace values matched by dictionary keys\",\n \"parameters\": [\n {\n \"name\": \"column\",\n \"supported_types\": [\n \"any\"\n ]\n }\n ]\n }\n]\n
The restore command is used to restore a database from a previously created dump. You can specify the dump to restore by providing the dump ID or use the latest keyword to restore the latest completed dump.
greenmask --config=config.yml restore DUMP_ID\n
Alternatively, to restore the latest completed dump, use the following command:
greenmask --config=config.yml restore latest\n
Note that the restore command shares the same parameters and environment variables as pg_restore, allowing you to configure the restoration process as needed.
Mostly it supports the same flags as the pg_restore utility, with some extra flags for Greenmask-specific features.
Supported flags
--batch-size int the number of rows to insert in a single batch during the COPY command (0 - all rows will be inserted in a single batch)\n -c, --clean clean (drop) database objects before recreating\n -C, --create create the target database\n -a, --data-only restore only the data, no schema\n -d, --dbname string connect to database name (default \"postgres\")\n --disable-triggers disable triggers during data section restore\n --enable-row-security enable row security\n -N, --exclude-schema strings do not restore objects in this schema\n -e, --exit-on-error exit on error, default is to continue\n -f, --file string output file name (- for stdout)\n -P, --function strings restore named function\n -h, --host string database server host or socket directory (default \"/var/run/postgres\")\n --if-exists use IF EXISTS when dropping objects\n -i, --index strings restore named index\n --inserts restore data as INSERT commands, rather than COPY\n -j, --jobs int use this many parallel jobs to restore (default 1)\n --list-format string use table of contents in format of text, json or yaml (default \"text\")\n --no-comments do not restore comments\n --no-data-for-failed-tables do not restore data of tables that could not be created\n -O, --no-owner skip restoration of object ownership\n -X, --no-privileges skip restoration of access privileges (grant/revoke)\n --no-publications do not restore publications\n --no-security-labels do not restore security labels\n --no-subscriptions ddo not restore subscriptions\n --no-table-access-method do not restore table access methods\n --no-tablespaces do not restore tablespace assignments\n --on-conflict-do-nothing add ON CONFLICT DO NOTHING to INSERT commands\n --overriding-system-value use OVERRIDING SYSTEM VALUE clause for INSERTs\n --pgzip use pgzip decompression instead of gzip\n -p, --port int database server port number (default 5432)\n --restore-in-order restore tables in topological order, ensuring that dependent tables are not restored until the tables they depend on have been restored\n -n, --schema strings restore only objects in this schema\n -s, --schema-only restore only the schema, no data\n --section string restore named section (pre-data, data, or post-data)\n -1, --single-transaction restore as a single transaction\n --strict-names restore named section (pre-data, data, or post-data) match at least one entity each\n -S, --superuser string superuser user name to use for disabling triggers\n -t, --table strings restore named relation (table, view, etc.)\n -T, --trigger strings restore named trigger\n -L, --use-list string use table of contents from this file for selecting/ordering output\n --use-session-replication-role-replica use SET session_replication_role = 'replica' to disable triggers during data section restore (alternative for --disable-triggers)\n --use-set-session-authorization use SET SESSION AUTHORIZATION commands instead of ALTER OWNER commands to set ownership\n -U, --username string connect as specified database user (default \"postgres\")\n -v, --verbose string verbose mode\n
"},{"location":"commands/restore/#extra-features","title":"Extra features","text":""},{"location":"commands/restore/#inserts-and-error-handling","title":"Inserts and error handling","text":"
Warning
Insert commands are a lot slower than COPY commands. Use this feature only when necessary.
By default, Greenmask restores data using the COPY command. If you prefer to restore data using INSERT commands, you can use the --inserts flag. This flag allows you to manage errors that occur during the execution of INSERT commands. By configuring an error and constraint exclusion list in the config, you can skip certain errors and continue inserting subsequent rows from the dump.
This can be useful when adding new records to an existing dump, but you don't want the process to stop if some records already exist in the database or violate certain constraints.
By adding the --on-conflict-do-nothing flag, it generates INSERT statements with the ON CONFLICT DO NOTHING clause, similar to the original pg_dump option. However, this approach only works for unique or exclusion constraints. If a foreign key is missing in the referenced table or any other constraint is violated, the insertion will still fail. To handle these issues, you can define anexclusion list in the config.
example with inserts and error handling
```shell title=\"example with inserts and on conflict do nothing\"\ngreenmask --config=config.yml restore DUMP_ID --inserts --on-conflict-do-nothing\n
By adding the --overriding-system-value flag, it generates INSERT statements with the OVERRIDING SYSTEM VALUE clause, which allows you to insert data into identity columns.
example of GENERATED ALWAYS AS IDENTITY column
CREATE TABLE people (\n id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,\n generated text GENERATED ALWAYS AS (id || first_name) STORED,\n first_name text\n);\n
"},{"location":"commands/restore/#restoration-in-topological-order","title":"Restoration in topological order","text":"
By default, Greenmask restores tables in the order they are listed in the dump file. To restore tables in topological order, use the --restore-in-order flag. This flag ensures that dependent tables are not restored until the tables they depend on have been restored.
This is useful when you have the schema already created with foreign keys and other constraints, and you want to insert data into the tables in the correct order or catch-up the target database with the new data.
Warning
Greenmask cannot guarantee restoration in topological order when the schema contains cycles. The only way to restore tables with cyclic dependencies is to temporarily remove the foreign key constraint (to break the cycle), restore the data, and then re-add the foreign key constraint once the data restoration is complete.
If your database has cyclic dependencies you will be notified about it but the restoration will continue.
2024-08-16T21:39:50+03:00 WRN cycle between tables is detected: cannot guarantee the order of restoration within cycle cycle=[\"public.employees\",\"public.departments\",\"public.projects\",\"public.employees\"]\n
By default, Greenmask uses gzip decompression to restore data. In mist cases it is quite slow and does not utilize all available resources and is a bootleneck for IO operations. To speed up the restoration process, you can use the --pgzip flag to use pgzip decompression instead of gzip. This method splits the data into blocks, which are decompressed in parallel, making it ideal for handling large volumes of data.
"},{"location":"commands/restore/#restore-data-batching","title":"Restore data batching","text":"
The COPY command returns the error only on transaction commit. This means that if you have a large dump and an error occurs, you will have to wait until the end of the transaction to see the error message. To avoid this, you can use the --batch-size flag to specify the number of rows to insert in a single batch during the COPY command. If an error occurs during the batch insertion, the error message will be displayed immediately. The data will be committed only if all batches are inserted successfully.
This is useful when you want to be notified of errors as immediately as possible without waiting for the entire table to be restored.
Warning
The batch size should be chosen carefully. If the batch size is too small, the restoration process will be slow. If the batch size is too large, you may not be able to identify the error row.
In the example below, the batch size is set to 1000 rows. This means that 1000 rows will be inserted in a single batch, so you will be notified of any errors immediately after each batch is inserted.
This command provides details about all objects and data that can be restored, similar to the pg_restore -l command in PostgreSQL. It helps you inspect the contents of the dump before performing the actual restoration.
Parameters:
--format \u2014 format of printing. Can be text or json.
To display metadata information about a dump, use the following command:
The date when the backup has been initiated, also indicating the snapshot date.
The date when the backup process was successfully completed.
The original size of the backup in bytes.
The size of the backup after compression in bytes.
A list of tables that underwent transformation during the backup.
The schema name of the table.
The name of the table.
Custom query override, if applicable.
A list of transformers that were applied during the backup.
The name of the transformer.
The parameters provided for the transformer.
A mapping of overridden column types.
The header information in the table of contents file. This provides the same details as the --format=text output in the previous snippet.
The list of restoration entries. This offers the same information as the --format=text output in the previous snippet.
Note
The json format provides more detailed information compared to the text format. The text format is primarily used for backward compatibility and for generating a restoration list that can be used with pg_restore -L listfile. On the other hand, the json format provides comprehensive metadata about the dump, including information about the applied transformers and their parameters. The json format is especially useful for detailed dump introspection.
This command prints out detailed information about a transformer by a provided name, including specific attributes to help you understand and configure the transformer effectively.
To show detailed information about a transformer, use the following command:
--format \u2014 allows to select the output format. There are two options available: text or json. The default setting is text.
Example of show-transformer output:
When using the show-transformer command, you receive detailed information about the transformer and its parameters and their possible attributes. Below are the key parameters for each transformer:
Name \u2014 the name of the transformer
Description \u2014 a brief description of what the transformer does
Parameters \u2014 a list of transformer parameters, each with its own set of attributes. Possible attributes include:
description \u2014 a brief description of the parameter's purpose
required \u2014 a flag indicating whether the parameter is required when configuring the transformer
link_parameter \u2014 specifies whether the value of the parameter will be encoded using a specific parameter type encoder. For example, if a parameter named column is linked to another parameter start, the start parameter's value will be encoded according to the column type when the transformer is initialized.
cast_db_type \u2014 indicates that the value should be encoded according to the database type. For example, when dealing with the INTERVAL data type, you must provide the interval value in PostgreSQL format.
default_value \u2014 the default value assigned to the parameter if it's not provided during configuration.
column_properties \u2014 if a parameter represents the name of a column, it may contain additional properties, including:
nullable \u2014 indicates whether the transformer may produce NULL values, potentially violating the NOT NULL constraint
unique \u2014 specifies whether the transformer guarantees unique values for each call. If set to true, it means that the transformer cannot produce duplicate values, ensuring compliance with the UNIQUE constraint.
affected \u2014 indicates whether the column is affected during the transformation process. If not affected, the column's value might still be required for transforming another column.
allowed_types \u2014 a list of data types that are compatible with this parameter
skip_original_data \u2014 specifies whether the original value of the column, before transformation, is relevant for the transformation process
skip_on_null \u2014 indicates whether the transformer should skip the transformation when the input column value is NULL. If the column value is NULL, interaction with the transformer is unnecessary.
Warning
The default value in JSON format is base64 encoded. This might be changed in later version of Greenmask.
The validate command allows you to perform a validation procedure and compare transformed data.
Below is a list of all supported flags for the validate command:
Supported flags
Usage:\n greenmask validate [flags]\n\nFlags:\n --data Perform test dump for --rows-limit rows and print it pretty\n --diff Find difference between original and transformed data\n --format string Format of output. possible values [text|json] (default \"text\")\n --rows-limit uint Check tables dump only for specific tables (default 10)\n --schema Make a schema diff between previous dump and the current state\n --table strings Check tables dump only for specific tables\n --table-format string Format of table output (only for --format=text). Possible values [vertical|horizontal] (default \"vertical\")\n --transformed-only Print only transformed column and primary key\n --warnings Print warnings\n
Validate command can exit with non-zero code when:
Any error occurred
Validate was called with --warnings flag and there are warnings
Validate was called with --schema flag and there are schema differences
All of those cases may be used for CI/CD pipelines to stop the process when something went wrong. This is especially useful when --schema flag is used - this allows to avoid data leakage when schema changed.
You can use the --table flag multiple times to specify the tables you want to check. Tables can be written with or without schema names (e. g., public.table_name or table_name). If you specify multiple tables from different schemas, an error will be thrown.
2024-03-15T19:46:12+02:00 WRN ValidationWarning={\"hash\":\"aa808fb574a1359c6606e464833feceb\",\"meta\":{\"ColumnName\":\"birthdate\",\"ConstraintDef\":\"CHECK (birthdate \\u003e= '1930-01-01'::date AND birthdate \\u003c= (now() - '18 years'::interval))\",\"ConstraintName\":\"humanresources\",\"ConstraintSchema\":\"humanresources\",\"ConstraintType\":\"Check\",\"ParameterName\":\"column\",\"SchemaName\":\"humanresources\",\"TableName\":\"employee\",\"TransformerName\":\"NoiseDate\"},\"msg\":\"possible constraint violation: column has Check constraint\",\"severity\":\"warning\"}\n
The validation output will provide detailed information about potential constraint violations and schema issues. Each line contains nested JSON data under the ValidationWarning key, offering insights into the affected part of the configuration and potential constraint violations.
Table schema name specifies the schema name of the affected table.
Table name identifies the name of the table where the problem occurs.
Transformer name indicates the name of the transformer responsible for the transformation.
Name of affected parameter typically, this is the name of the column parameter that is relevant to the validation warning.
Validation warning description provides a detailed description of the validation warning and the reason behind it.
Severity of validation warning indicates the severity level of the validation warning and can be one of the following:
* error\n* warning\n* info\n* debug\n
Hash is a unique identifier of the validation warning. It is used to resolve the warning in the config file
Note
A validation warning with a severity level of \"error\" is considered critical and must be addressed before the dump operation can proceed. Failure to resolve such warnings will prevent the dump operation from being executed.
Schema diff changed output example
2024-03-15T19:46:12+02:00 WRN Database schema has been changed Hint=\"Check schema changes before making new dump\" PreviousDumpId=1710520855501\n2024-03-15T19:46:12+02:00 WRN Column renamed Event=ColumnRenamed Signature={\"CurrentColumnName\":\"id1\",\"PreviousColumnName\":\"id\",\"TableName\":\"test\",\"TableSchema\":\"public\"}\n2024-03-15T19:46:12+02:00 WRN Column type changed Event=ColumnTypeChanged Signature={\"ColumnName\":\"id\",\"CurrentColumnType\":\"bigint\",\"CurrentColumnTypeOid\":\"20\",\"PreviousColumnType\":\"integer\",\"PreviousColumnTypeOid\":\"23\",\"TableName\":\"test\",\"TableSchema\":\"public\"}\n2024-03-15T19:46:12+02:00 WRN Column created Event=ColumnCreated Signature={\"ColumnName\":\"name\",\"ColumnType\":\"text\",\"TableName\":\"test\",\"TableSchema\":\"public\"}\n2024-03-15T19:46:12+02:00 WRN Table created Event=TableCreated Signature={\"SchemaName\":\"public\",\"TableName\":\"test1\",\"TableOid\":\"20563\"}\n
Example of validation diff:
The validation diff is presented in a neatly formatted table. In this table:
Columns that are affected by the transformation are highlighted with a red background.
The pre-transformation values are displayed in green.
The post-transformation values are shown in red.
The result in --format=text can be displayed in either horizontal (--table-format=horizontal) or vertical (--table-format=vertical) format, making it easy to visualize and understand the differences between the original and transformed data.
The whole validate command may be run in json format including logging making easy to parse the structure.
We are excited to announce the release of Greenmask v0.1.0, marking the first production-ready version. This release addresses various bug fixes, introduces improvements, and includes documentation refactoring for enhanced clarity.
Added positional arguments for the list-transformers command, allowing specific transformer information retrieval (e.g., greenmask list-transformers RandomDate).
Added a version parameter --version that prints Greenmask version.
Added numeric parameters support for -Int and -Float transformers.
Improved verbosity in custom transformer interaction, accumulating stderr data and forwarding it in batches instead of writing it one by one.
Updated dependencies to newer versions.
Enhanced the stability of the JSON line interaction protocol by utilizing the stdlib JSON encoder/decoder.
Modified the method for sending table metadata to custom transformers; now, it is sent via stdin in the first line in JSON format instead of providing it via command arguments.
Refactored template functions naming.
Refactored NoiseDate transformer implementation for improved stability and predictability.
Changed the default value for the Dict transformer: fail_not_matched parameter: true.
Refactored the Hash transformer to provide a salt parameter and receive a base64 encoded salt. If salt is not provided, it generates one randomly.
Added validation for the truncate parameter of NoiseDate and RandomDate transformers that issues a warning if the provided value is invalid.
Increased verbosity of parameter validation warnings, now properly forwarding warnings to stdout.
We are excited to announce the beta release of Greenmask, a versatile and open-source utility for PostgreSQL logical backup dumping, anonymization, and restoration. Greenmask is perfect for routine backup and restoration tasks. It facilitates anonymization and data masking for staging environments and analytics.
This release introduces a range of features aimed at enhancing database management and security.
Transformer Description RandomLatitude Generates a random latitude value RandomLongitude Generates a random longitude value RandomUnixTime Generates a random Unix timestamp RandomMonthName Generates the name of a random month RandomYearString Generates a random year as a string RandomDayOfWeek Generates a random day of the week RandomDayOfMonth Generates a random day of the month RandomCentury Generates a random century RandomTimezone Generates a random timezone RandomEmail Generates a random email address RandomMacAddress Generates a random MAC address RandomDomainName Generates a random domain name RandomURL Generates a random URL RandomUsername Generates a random username RandomIPv4 Generates a random IPv4 address RandomIPv6 Generates a random IPv6 address RandomPassword Generates a random password RandomWord Generates a random word RandomSentence Generates a random sentence RandomParagraph Generates a random paragraph RandomCCType Generates a random credit card type RandomCCNumber Generates a random credit card number RandomCurrency Generates a random currency code RandomAmountWithCurrency Generates a random monetary amount with currency RandomTitleMale Generates a random title for males RandomTitleFemale Generates a random title for females RandomFirstName Generates a random first name RandomFirstNameMale Generates a random male first name RandomFirstNameFemale Generates a random female first name RandomLastName Generates a random last name RandomName Generates a full random name RandomPhoneNumber Generates a random phone number RandomTollFreePhoneNumber Generates a random toll-free phone number RandomE164PhoneNumber Generates a random phone number in E.164 format RealAddress Generates a real address"},{"location":"release_notes/greenmask_0_1_1/#assets","title":"Assets","text":"
To download the Greenmask binary compatible with your system, see the release's assets list.
The Hash transformer has been completely remastered and now has the function parameter to choose from several hash algorithm options and the max_length parameter to truncate the hash tail.
Split information about transformers between the list-transformers and new show-transformer CLI commands, which allows for more comprehensible and useful outputs for both commands
Added error severity for the Cmd parameter validator
Added restoration filtering by --table, --schema and --exclude-schema parameters
Validate parameters without parameters validates only configuration file
Added the --schema parameter, which allows to make a schema diff between the previous dump and the current. This is useful when you want to check if the schema has changed after the migration. By controlling it we can exclude data leakage after migration
Validate command divided by many stages that can be controlled using parameters
Added salt parameter that can be set via config or via GREENMASK_GLOBAL_SALT
Added sha3 functions support in different modes (sha3-224, sha3-256, sha3-384, sha3-512)
Refactored Cmd transformer logic
Json API: Now it allows to use of column names instead of column indexes in JSON format
Csv API: Now it can use the column order from config via column remapping
The validate command was rewritten almost from scratch.
New option --transformed-only - displays only columns that are transformed with primary key (if exists). This allows to reduce the output data and make it more readable
Implemented json format for output
Added the --table-format parameter which is responsible for the vertical and horizontal table orientation. This works only when --format=text
Added the --warnings parameter, if it is specified then not only fatal-warnings will be displayed, but also those with a lower severity
Fixed --use-list option - now it applies toc entries according to the order in list file
Fixed --use-list option behaviour together with --list-format option (json or text). Now it generates temporal list file in text format for providing it to the pg_restore call
Updated documentation according to the latest changes
Implemented --exit-on-error parameter for pg_restore run. But it does not play for \"data\" section restoration now. If any error is caused in data section greenmask exits with the error whether --exit-on-error was provided or not. This might be fixed later
Fixed dependent objects dropping when running with the restore command with the --clean parameter. Useful when restoring and overriding only required tables
Fixed show-dump command output in text mode
Disabled CGO. Fixes problem when downloaded binary from repo cannot run
Implemented tables scoring according to the table size and transformation costs. This correctly spread the tables dumping between the requested workers pool and reduces the execution time. Now greenmask introspects the table size, adds the transformation scoring using the formula score = tableSizeInBytes + (tableSizeInBytes * 0.03 * tableTransformationsCount), and uses the strategy \"Largest First\". The problem is described here
Introduced no_verify_ssl parameter for S3 storage
Adjusted Dockerfile
Changed entrypoint to greenmask binary
The greenmask container now runs under greenmask user and groups
Refactored storage config structure. Now it contains the type that is used for the storage type determination
Most of the attributes may be overridden with environment variables where the letters are capitalized and the dots are replaced with underscores. For instance, the setting storage.type might be represented with the environment variable STORAGE_TYPE
Parameter --config is not required anymore. This simplifies the greenmask utility user experience
Directory storage set as the default
Set the default temporary directory as /tmp
Added environment variable section to the configuration docs
This is one of the biggest releases since Greenmask was founded. We've been in close contact with our users, gathering feedback, and working hard to make Greenmask more flexible, reliable, and user-friendly.
This major release introduces exciting new features such as database subsetting, pgzip support, restoration in topological order, and refactored transformers, significantly enhancing Greenmask's flexibility to better meet business needs. It also includes several fixes and improvements.
This release is a major milestone that significantly expands Greenmask's functionality, transforming it into a simple, extensible, and reliable solution for database security, data anonymization, and everyday operations. Our goal is to create a core system that can serve as a foundation for comprehensive dynamic staging environments and robust data security.
PostgreSQL 17 support - revised ported library to support PostgreSQL 17
Database Subset - a new feature that allows you to define a subset of the database, allowing you to scale down the dump size (#110). This is robust for multipurpose and especially useful for testing and development environments. It supports:
References with NULL values - generate the LEFT JOIN query for the FK reference with NULL values to include them in the subset.
Supports virtual references (virtual foreign keys) - create a logical FK in Greenmask that will be used for subset dependencies graph. The virtual reference can be defined for a column or an expression, allowing you to get the value from JSON and similar.
Supports circular references - Greenmask will automatically resolve circular dependencies in the subset by generating a recursive query. The query is generated with integrity checks of the subset ensuring that the data gathered from circular dependencies is consistent.
Fully covered with documentation including troubleshooting and examples.
Supports FK and PK that have more than one column (or expression).
Multi-cycles resolution in one strong connected component (SCC) is supported - Greenmask will generate a recursive query for the SCC whether it is a single cycle or multiple cycles, making the subset system universal for any database schema.
Supports polymorphic relationships - You can define a virtual reference for a table with polymorphic references using polymorphic_exprs attribute and use greenmask to generate a subset for such tables.
pgzip support for faster compression and decompression \u2014 setting --pgzip can speed up the dump and restoration processes through parallel compression. In some tests, it shows up to 5x faster dump and restore operations.
Restoration in topological order - This flag ensures that dependent tables are not restored until the tables they depend on have been restored. This is useful when you want to be notified of errors as immediately as possible without waiting for the entire table to be restored.
Insert format restoration - For a flexible restoration process, Greenmask now supports data restoration in the INSERT format. It generates the insert statements based on COPY records from the dump. You do not need to re-dump your data to use this feature; it can be defined in the restore command. The list of new features related to the INSERT format:
Generate INSERT statements with the **ON CONFLICT DO NOTHING** clause if the flag --on-conflict-do-nothing is set.
Error exclusion list in the config to skip certain errors and continue inserting subsequent rows from the dump.
Use cases - incremental dump and restoration for logical data. For example, if you have a database, and you want to insert data periodically from another source, this can be used together with the database subset and transformations to catch up the target database.
Restore data batching (#173) - By default, the COPY protocol returns the error only on transaction commit. To override this behavior, use the --batch-size flag to specify the number of rows to insert in a single batch during the COPY command. This is useful when you want to control the transaction size and commit.
Introduced keep_null parameter for RandomPerson transformer.
Introduced dynamic parameters in the transformers
Most transformers now support dynamic parameters where applicable.
Dynamic parameters are strictly enforced. If you need to cast values to another type, Greenmask provides templates and predefined cast functions accessible via cast_to. These functions cover frequent operations such as UnixTimestampToDate and IntToBool.
The transformation logic has been significantly refactored, making transformers more customizable and flexible than before.
Introduced transformation engines
random - generates transformer values based on pseudo-random algorithms.
hash - generates transformer values using hash functions. Currently, it utilizes sha3 hash functions, which are secure but perform slowly. In the stable release, there will be an option to choose between sha3 and SipHash.
Introduced static parameters value template
Dumps retention management - Introduced retention parameters (#201) for the delete command. Introduced two new statuses: failed and in progress. A dump is considered failed if it lacks a \"done\" heartbeat or if the last heartbeat timestamp exceeds 30 minutes. The delete command now supports the following retention parameters:
--dry-run: Runs the deletion operation in test mode with verbose output, without actually deleting anything.
--before-date 2024-08-27T23:50:54+00:00: Deletes dumps older than the specified date. The date must be provided in RFC3339Nano format, for example: 2021-01-01T00:00:00Z.
--retain-recent 10: Retains the N most recent dumps, where N is specified by the user.
--retain-for 1w2d3h4m5s6ms7us8ns: Retains dumps for the specified duration. The format supports weeks (w), days (d), hours (h), minutes (m), seconds (s), milliseconds (ms), microseconds (us), and nanoseconds (ns).
--prune-failed: Prunes (removes) all dumps that have failed.
--prune-unsafe: Prunes dumps with \"unknown-or-failed\" statuses. This option only works in conjunction with --prune-failed.
Docker image mirroring into the GitHub Container Registry
Introduced the Parametrizer interface, now implemented for both dynamic and static parameters.
Renamed most of the toolkit types for enhanced clarity and comprehensive documentation coverage.
Refactored the Driver initialization logic.
Added validation warnings for overridden types in the Driver.
Migrated existing built-in transformers to utilize the new Parametrizer interface.
Implemented a new abstraction, TransformationContext, as the first step towards enabling new feature transformation conditions (#34).
Optimized most transformers for performance in both dynamic and static modes. While dynamic mode offers flexibility, static mode ensures performance remains high. Using only the necessary transformation features helps keep transformation time predictable.
RandomEmail - Introduces a new transformer that supports both random and deterministic engines. It allows for flexible email value generation; you can use column values in the template and choose to keep the original domain or select any from the domains parameter.
NoiseDate, NoiseFloat, NoiseInt - These transformers support both random and deterministic engines, offering dynamic mode parameters that control the noise thresholds within the min and max range. Unlike previous implementations which used a single ratio parameter, the new release features min_ratio and max_ratio parameters to define noise values more precisely. Utilizing the hash engine in these transformers enhances security by complicating statistical analysis for attackers, especially when the same salt is used consistently over long periods.
NoiseNumeric - A newly implemented transformer, sharing features with NoiseInt and NoiseFloat, but specifically designed for numeric values (large integers or floats). It provides a decimal parameter to handle values with fractions.
RandomChoice - Now supports the hash engine
RandomDate, RandomFloat, RandomInt - Now enhanced with hash engine support. Threshold parameters min and max have been updated to support dynamic mode, allowing for more flexible configurations.
RandomNumeric - A new transformer specifically designed for numeric types (large integers or floats), sharing similar features with RandomInt and RandomFloat, but tailored for handling huge numeric values.
RandomString - Now supports hash engine mode
RandomUnixTimestamp - This new transformer generates Unix timestamps with selectable units (second, millisecond, microsecond, nanosecond). Similar in function to RandomDate, it supports the hash engine and dynamic parameters for min and max thresholds, with the ability to override these units using min_unit and max_unit parameters.
RandomUuid - Added hash engine support
RandomPerson - Implemented a new transformer that replaces RandomName, RandomLastName, RandomFirstName, RandomFirstNameMale, RandomFirstNameFemale, RandomTitleMale, and RandomTitleFemale. This new transformer offers enhanced customizability while providing similar functionalities as the previous versions. It generates personal data such as FirstName, LastName, and Title, based on the provided gender parameter, which now supports dynamic mode. Future minor versions will allow for overriding the default names database.
Added tsModify - a new template function for time.Time objects modification
Introduced a new RandomIp transformer capable of generating a random IP address based on the specified netmask.
Added a new RandomMac transformer for generating random Mac addresses.
Deleted transformers include RandomMacAddress, RandomIPv4, RandomIPv6, RandomUnixTime, RandomTitleMale, RandomTitleFemale, RandomFirstName, RandomFirstNameMale, RandomFirstNameFemale, RandomLastName, and RandomName due to the introduction of more flexible and unified options.
"},{"location":"release_notes/greenmask_0_2_0/#fixes-and-improvements","title":"Fixes and improvements","text":"
Fixed validate command with the --table flag, which had the wrong order of the table name representation {{ table_name }}.{{ schema }} instead of {{ schema }}.{{ table_name }}.
Fixed Row.SetColumn out of range validation.
Fixed restoreWorker panic caused when the worker received an error from pgx.
Fixed error handling in the restore command.
Fixed restore jobs now start a transaction for each table restoration and commit it after the table restoration is done.
Fixed --exit-on-error works incorrectly in the restore command. Now, the --exit-on-error flag works correctly with the data section.
Fixed transaction rollback in the validate command.
Fixed typo in documentation.
Fixed a CI/CD bug related to retrieving current tags.
Fixed the Docker image tag for latest to exclude specific keywords.
Fixed a case where the hashing value was not set for each column in the RandomPerson transformer.
Fixed original email value parsing conditions.
Subset docs revision.
Fixes a case where data entries were excluded by exclusion parameters such as --exclude-table, --table, etc.
Fixed zero bytes that were written in the buffer due to the wrong buffer limit in the Email transformer.
Fixed a case where the overridden type of column via columns_type_override did not work.
Fixed a case where an unknown option provided in the config was just ignored instead of throwing an error.
Fixed a case where min and max parameter values were ignored in transformers NoiseDate, NoiseNumeric, NoiseFloat, NoiseInt, RandomNumeric, RandomFloat, and RandomInt.
Fixed TOC entry COPY restoration statement - added missing newline and semicolon. Now backward pg_dump call pg_restore 1724504511561 --file 1724504511561.sql is backward compatible and works as expected.
Fixed a case where dump/restore fails when masking tables with a generated column.
Updated go version (v1.22) and dependencies
Revised installation section of doc
PostgreSQL 17 support - revised ported library to support PostgreSQL 17
Fixed integration tests - reset the go test cache on each iteration
Push docker images to ghcr.io registry
A bunch of refactoring and code cleanup to make the codebase more maintainable and readable.
This major beta release introduces new features and refactored transformers, significantly enhancing Greenmask's flexibility to better meet business needs.
Most transformers now support dynamic parameters where applicable.
Dynamic parameters are strictly enforced. If you need to cast values to another type, Greenmask provides templates and predefined cast functions accessible via cast_to. These functions cover frequent operations such as UnixTimestampToDate and IntToBool.
The transformation logic has been significantly refactored, making transformers more customizable and flexible than before.
Introduced transformation engines
random - generates transformer values based on pseudo-random algorithms.
hash - generates transformer values using hash functions. Currently, it utilizes sha3 hash functions, which are secure but perform slowly. In the stable release, there will be an option to choose between sha3 and SipHash.
Introduced the Parametrizer interface, now implemented for both dynamic and static parameters.
Renamed most of the toolkit types for enhanced clarity and comprehensive documentation coverage.
Refactored the Driver initialization logic.
Added validation warnings for overridden types in the Driver.
Migrated existing built-in transformers to utilize the new Parametrizer interface.
Implemented a new abstraction, TransformationContext, as the first step towards enabling new feature transformation conditions (#34).
Optimized most transformers for performance in both dynamic and static modes. While dynamic mode offers flexibility, static mode ensures performance remains high. Using only the necessary transformation features helps keep transformation time predictable.
RandomEmail - Introduces a new transformer that supports both random and deterministic engines. It allows for flexible email value generation; you can use column values in the template and choose to keep the original domain or select any from the domains parameter.
NoiseDate, NoiseFloat, NoiseInt - These transformers support both random and deterministic engines, offering dynamic mode parameters that control the noise thresholds within the min and max range. Unlike previous implementations which used a single ratio parameter, the new release features min_ratio and max_ratio parameters to define noise values more precisely. Utilizing the hash engine in these transformers enhances security by complicating statistical analysis for attackers, especially when the same salt is used consistently over long periods.
NoiseNumeric - A newly implemented transformer, sharing features with NoiseInt and NoiseFloat, but specifically designed for numeric values (large integers or floats). It provides a decimal parameter to handle values with fractions.
RandomChoice - Now supports the hash engine
RandomDate, RandomFloat, RandomInt - Now enhanced with hash engine support. Threshold parameters min and max have been updated to support dynamic mode, allowing for more flexible configurations.
RandomNumeric - A new transformer specifically designed for numeric types (large integers or floats), sharing similar features with RandomInt and RandomFloat, but tailored for handling huge numeric values.
RandomString - Now supports hash engine mode
RandomUnixTimestamp - This new transformer generates Unix timestamps with selectable units (second, millisecond, microsecond, nanosecond). Similar in function to RandomDate, it supports the hash engine and dynamic parameters for min and max thresholds, with the ability to override these units using min_unit and max_unit parameters.
RandomUuid - Added hash engine support
RandomPerson - Implemented a new transformer that replaces RandomName, RandomLastName, RandomFirstName, RandomFirstNameMale, RandomFirstNameFemale, RandomTitleMale, and RandomTitleFemale. This new transformer offers enhanced customizability while providing similar functionalities as the previous versions. It generates personal data such as FirstName, LastName, and Title, based on the provided gender parameter, which now supports dynamic mode. Future minor versions will allow for overriding the default names database.
Added tsModify - a new template function for time.Time objects modification
Introduced a new RandomIp transformer capable of generating a random IP address based on the specified netmask.
Added a new RandomMac transformer for generating random Mac addresses.
Deleted transformers include RandomMacAddress, RandomIPv4, RandomIPv6, RandomUnixTime, RandomTitleMale, RandomTitleFemale, RandomFirstName, RandomFirstNameMale, RandomFirstNameFemale, RandomLastName, and RandomName due to the introduction of more flexible and unified options.
"},{"location":"release_notes/greenmask_0_2_0_b1/#full-changelog-v0114v020b1","title":"Full Changelog: v0.1.14...v0.2.0b1","text":""},{"location":"release_notes/greenmask_0_2_0_b1/#playground-usage-for-beta-version","title":"Playground usage for beta version","text":"
If you want to run a Greenmask playground for the beta version v0.2.0b1 execute:
git checkout tags/v0.2.0b1 -b v0.2.0b1\ndocker-compose run greenmask-from-source\n
This major beta release introduces new features such as the database subset, pgzip support, restoration in topological and many more. It also includes fixes and improvements.
This release is a major milestone that significantly expands Greenmask's functionality, transforming it into a simple, extensible, and reliable solution for database security, data anonymization, and everyday operations. Our goal is to create a core system that can serve as a foundation for comprehensive dynamic staging environments and robust data security.
Database Subset - a new feature that allows you to define a subset of the database, allowing you to scale down the dump size (#110). This is robust for multipurpose and especially useful for testing and development environments. It supports:
References with NULL values - generate the LEFT JOIN query for the FK reference with NULL values to include them in the subset.
Supports virtual references (virtual foreign keys) - create a logical FK in Greenmask that will be used for subset dependencies graph. The virtual reference can be defined for a column or an expression, allowing you to get the value from JSON and similar.
Supports circular references - Greenmask will automatically resolve circular dependencies in the subset by generating a recursive query. The query is generated with integrity checks of the subset ensuring that the data gathered from circular dependencies is consistent.
Fully covered with documentation including troubleshooting and examples.
Supports FK and PK that have more than one column (or expression).
Multi-cycles resolution in one strong connected component (SCC) is supported - Greenmask will generate a recursive query for the SCC whether it is a single cycle or multiple cycles, making the subset system universal for any database schema.
pgzip support for faster compression and decompression \u2014 setting --pgzip can speed up the dump and restoration processes through parallel compression. In some tests, it shows up to 5x faster dump and restore operations.
Restoration in topological order - This flag ensures that dependent tables are not restored until the tables they depend on have been restored. This is useful when you want to be notified of errors as immediately as possible without waiting for the entire table to be restored.
Insert format restoration - For a flexible restoration process, Greenmask now supports data restoration in the INSERT format. It generates the insert statements based on COPY records from the dump. You do not need to re-dump your data to use this feature; it can be defined in the restore command. The list of new features related to the INSERT format:
Generate INSERT statements with the **ON CONFLICT DO NOTHING** clause if the flag --on-conflict-do-nothing is set.
Error exclusion list in the config to skip certain errors and continue inserting subsequent rows from the dump.
Use cases - incremental dump and restoration for logical data. For example, if you have a database, and you want to insert data periodically from another source, this can be used together with the database subset and transformations to catch up the target database.
Restore data batching (#173) - By default, the COPY protocol returns the error only on transaction commit. To override this behavior, use the --batch-size flag to specify the number of rows to insert in a single batch during the COPY command. This is useful when you want to control the transaction size and commit.
Introduced keep_null parameter for RandomPerson transformer.
"},{"location":"release_notes/greenmask_0_2_0_b2/#fixes-and-improvements","title":"Fixes and improvements","text":"
Fixed validate command with the --table flag, which had the wrong order of the table name representation {{ table_name }}.{{ schema }} instead of {{ schema }}.{{ table_name }}.
Fixed Row.SetColumn out of range validation.
Fixed restoreWorker panic caused when the worker received an error from pgx.
Fixed error handling in the restore command.
Fixed restore jobs now start a transaction for each table restoration and commit it after the table restoration is done.
Fixed --exit-on-error works incorrectly in the restore command. Now, the --exit-on-error flag works correctly with the data section.
Fixed transaction rollback in the validate command.
Fixed typo in documentation.
Fixed a CI/CD bug related to retrieving current tags.
Fixed the Docker image tag for latest to exclude specific keywords.
Fixed a case where the hashing value was not set for each column in the RandomPerson transformer.
Fixed original email value parsing conditions.
Subset docs revision.
Fixes a case where data entries were excluded by exclusion parameters such as --exclude-table, --table, etc.
Fixed zero bytes that were written in the buffer due to the wrong buffer limit in the Email transformer.
Fixed a case where the overridden type of column via columns_type_override did not work.
Fixed a case where an unknown option provided in the config was just ignored instead of throwing an error.
Fixed a case where min and max parameter values were ignored in transformers NoiseDate, NoiseNumeric, NoiseFloat, NoiseInt, RandomNumeric, RandomFloat, and RandomInt.
Fixed TOC entry COPY restoration statement - added missing newline and semicolon. Now backward pg_dump call pg_restore 1724504511561 --file 1724504511561.sql is backward compatible and works as expected.
Fixed a case where dump/restore fails when masking tables with a generated column.
Updated go version (v1.22) and dependencies
Revised installation section of doc
A bunch of refactoring and code cleanup to make the codebase more maintainable and readable.
"},{"location":"release_notes/greenmask_0_2_0_b2/#full-changelog-v020b1v020b2","title":"Full Changelog: v0.2.0b1...v0.2.0b2","text":""},{"location":"release_notes/greenmask_0_2_0_b2/#playground-usage-for-beta-version","title":"Playground usage for beta version","text":"
If you want to run a Greenmask playground for the beta version v0.2.0b2 execute:
git checkout tags/v0.2.0b2 -b v0.2.0b2\ndocker-compose run greenmask-from-source\n
This release introduces two new features transformation conditions and transformation inheritance for primary and foreign keys. It also includes several bug fixes and improvements.
Fixed an issue where the partitioned table itself was executed in the restore worker, resulting in a \"file not found\" error in storage. Closes bug: restoring partitioned tables fails #238 #242.
Fixed template function availability #239. Renamed methods according to the documentation: GetColumnRawValue is now GetRawColumnValue, and SetColumnRawValue is now SetRawColumnValue #242
Resolved an issue where Dump.createTocEntries processed partitioned tables as if they were physical entities, despite being logical #241
Corrected merging in the pre-data, data, and post-data sections, which previously caused a panic in dump command when the post-data section was excluded #241
Fixed an issue where dumps created with --load-via-partition-root did not use the root partition table in --inserts generation during restoration #241
Feel free to reach out to us if you have any questions or need assistance:
Greenmask Roadmap
Email
Twitter
Telegram
Discord
DockerHub
"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"About Greenmask","text":""},{"location":"#dump-anonymization-and-synthetic-data-generation-tool","title":"Dump anonymization and synthetic data generation tool","text":"
Greenmask is a powerful open-source utility that is designed for logical database backup dumping, anonymization, synthetic data generation and restoration. It has ported PostgreSQL libraries, making it reliable. It is stateless and does not require any changes to your database schema. It is designed to be highly customizable and backward-compatible with existing PostgreSQL utilities, fast and reliable.
Deterministic transformers \u2014 deterministic approach to data transformation based on the hash functions. This ensures that the same input data will always produce the same output data. Almost each transformer supports either random or hash engine making it universal for any use case.
Dynamic parameters \u2014 almost each transformer supports dynamic parameters, allowing to parametrize the transformer dynamically from the table column value. This is helpful for resolving the functional dependencies between columns and satisfying the constraints.
Transformation validation and easy maintainable - During configuration process, Greenmask provides validation warnings, data transformation diff and schema diff features, allowing you to monitor and maintain transformations effectively throughout the software lifecycle. Schema diff helps to avoid data leakage when schema changed.
Partitioned tables transformation inheritance \u2014 Define transformation configurations once and apply them to all partitions within partitioned tables (using apply_for_inherited parameter), simplifying the anonymization process.
Stateless - Greenmask operates as a logical dump and does not impact your existing database schema.
Cross-platform - Can be easily built and executed on any platform, thanks to its Go-based architecture, which eliminates platform dependencies.
Database type safe - Ensures data integrity by validating data and utilizing the database driver for encoding and decoding operations. This approach guarantees the preservation of data formats.
Backward compatible - It fully supports the same features and protocols as existing vanilla PostgreSQL utilities. Dumps created by Greenmask can be successfully restored using the pg_restore utility.
Extensible - Users have the flexibility to implement domain-based transformations in any programming language or use predefined templates.
Integrable - Integrate seamlessly into your CI/CD system for automated database anonymization and restoration.
Parallel execution - Take advantage of parallel dumping and restoration, significantly reducing the time required to deliver results.
Provide variety of storages - offers a variety of storage options for local and remote data storage, including directories and S3-like storage solutions.
Pgzip support for faster compression \u2014 by setting --pgzip, it can speeds up the dump and restoration processes through parallel compression.
Greenmask is ideal for various scenarios, including:
Backup and restoration. Use Greenmask for your daily routines involving logical backup dumping and restoration. It seamlessly handles tasks like table restoration after truncation. Its functionality closely mirrors that of pg_dump and pg_restore, making it a straightforward replacement.
Anonymization, transformation, and data masking. Employ Greenmask for anonymizing, transforming, and masking backups, especially when setting up a staging environment or for analytical purposes. It simplifies the deployment of a pre-production environment with consistently anonymized data, facilitating faster time-to-market in the development lifecycle.
It is evident that the most appropriate approach for executing logical backup dumping and restoration is by leveraging the core PostgreSQL utilities, specifically pg_dump and pg_restore. Greenmask has been purposefully designed to align with PostgreSQL's native utilities, ensuring compatibility. Greenmask primarily handles data dumping operations independently and delegates the responsibilities of schema dumping and restoration to pg_dump and pg_restore respectively, maintaining seamless integration with PostgreSQL's standard tools.
The process of backing up PostgreSQL databases is divided into three distinct sections:
Pre-data \u2014 this section encompasses the raw schema of tables, excluding primary keys (PK) and foreign keys (FK).
Data \u2014 the data section contains the actual table data in COPY format, including information about sequence current values and Large Objects data.
Post-data \u2014 in this section, you'll find the definitions of indexes, triggers, rules, and constraints (such as PK and FK).
Greenmask focuses exclusively on the data section during runtime. It delegates the handling of the pre-data and post-data sections to the core PostgreSQL utilities, pg_dump and pg_restore.
Greenmask employs the directory format of pg_dump and pg_restore. This format is particularly suitable for parallel execution and partial restoration, and it includes clear metadata files that aid in determining the backup and restoration steps. Greenmask has been optimized to work seamlessly with remote storage systems and anonymization procedures.
When performing data dumping, Greenmask utilizes the COPY command in TEXT format, maintaining reliability and compatibility with the vanilla PostgreSQL utilities.
Additionally, Greenmask supports parallel execution, significantly reducing the time required for the dumping process.
The core PostgreSQL utilities, pg_dump and pg_restore, traditionally operate with files in a directory format, offering no alternative methods. To meet modern backup requirements and provide flexible approaches, Greenmask introduces the concept of storages.
s3 \u2014 this option supports any S3-like storage system, including AWS S3, which makes it versatile and adaptable to various cloud-based storage solutions.
directory \u2014 this is the standard choice, representing the ordinary filesystem directory for local storage.
In the restoration process, Greenmask combines the capabilities of different tools:
For schema restoration Greenmask utilizes pg_restore to restore the database schema. This ensures that the schema is accurately reconstructed.
For data restoration Greenmask independently applies the data using the COPY protocol. This allows Greenmask to handle the data efficiently, especially when working with various storage solutions. Greenmask is aware of the restoration metadata, which enables it to download only the necessary data. This feature is particularly useful for partial restoration scenarios, such as restoring a single table from a complete backup.
Greenmask also supports parallel restoration, which can significantly reduce the time required to complete the restoration process. This parallel execution enhances the efficiency of restoring large datasets.
"},{"location":"architecture/#data-anonymization-and-validation","title":"Data anonymization and validation","text":"
Greenmask works with COPY lines, collects schema metadata using the Golang driver, and employs this driver in the encoding and decoding process. The validate command offers a way to assess the impact on both schema (validation warnings) and data (transformation and displaying differences). This command allows you to validate the schema and data transformations, ensuring the desired outcomes during the anonymization process.
If your table schema relies on functional dependencies between columns, you can address this challenge using the TemplateRecord transformer. This transformer enables you to define transformation logic for entire tables, offering type-safe operations when assigning new values.
Greenmask provides a framework for creating your custom transformers, which can be reused efficiently. These transformers can be seamlessly integrated without requiring recompilation, thanks to the PIPE (stdin/stdout) interaction.
Note
Furthermore, Greenmask's architecture is designed to be highly extensible, making it possible to introduce other interaction protocols, such as HTTP or Socket, for conducting anonymization procedures.
"},{"location":"architecture/#postgresql-version-compatibility","title":"PostgreSQL version compatibility","text":"
Greenmask is compatible with PostgreSQL versions 11 and higher.
common \u2014 settings that can be used for both the dump and restore commands
log \u2014 settings for the logging subsystem
storage \u2014 settings for the storage locations where dumps are stored
dump \u2014 settings for the dump command. This section includes pg_dump options and transformation parameters.
restore \u2014 settings for the restore command. It contains pg_restore options and additional restoration scripts.
custom_transformers \u2014 definitions of the custom transformers that interact through stdin and stdout. Once a custom transformer is configured, it becomes accessible via the greenmask list-transformers command.
In the common section of the configuration, you can specify the following settings:
pg_bin_path \u2014 path to the PostgreSQL binaries. Note that the PostgreSQL server version must match the provided binaries.
tmp_dir \u2014 temporary directory for storing the table of contents files. Default value is /tmp
Note
Greenmask exclusively manages data dumping and data restoration processes, delegating schema dumping to the pg_dumputility and schema restoration to the pg_restore utility. Both pg_dump and pg_restore rely on a toc.dat file located in a specific directory, which contains metadata and object definitions. Therefore, the tmp_dir parameter is essential for storing the toc.dat file during the dumping or restoration procedure. It is important to note that all artifacts in this directory will be automatically deleted once the Greenmask command is completed.
In the storage section, you can configure the storage driver for storing the dumped data. Currently, two storage type options are supported: directory and s3.
directory options3 option
The directory storage option refers to a filesystem directory where the dump data will be stored.
Parameters include path which specifies the path to the directory in the filesystem where the dumps will be stored.
By choosing the s3 storage option, you can store dump data in an S3-like remote storage service, such as Amazon S3 or Azure Blob Storage. Here are the parameters you can configure for S3 storage:
endpoint \u2014 overrides the default AWS endpoint to a custom one for making requests
bucket \u2014 the name of the bucket where the dump data will be stored
prefix \u2014 a prefix for objects in the bucket, specified in path format
region \u2014 the S3 service region
storage_class \u2014 the storage class for performing object requests
access_key_id \u2014 access key for authentication
secret_access_key \u2014 secret access key for authentication
session_token \u2014 session token for authentication
role_arn \u2014 Amazon resource name for role-based authentication
session_name \u2014 role session name to uniquely identify a session
max_retries \u2014 the number of retries on request failures
cert_file \u2014 the path to the SSL certificate for making requests
max_part_size \u2014 the maximum part length for one request
concurrency \u2014 the number of goroutines to use in parallel for each upload call when sending parts
use_list_objects_v1 \u2014 use the old v1 ListObjects request instead of v2 one
force_path_style \u2014 force the request to use path-style addressing (e. g., http://s3.amazonaws.com/BUCKET/KEY) instead of virtual hosted bucket addressing (e. g., http://BUCKET.s3.amazonaws.com/KEY)
In the dump section of the configuration, you configure the greenmask dump command. It includes the following parameters:
pg_dump_options \u2014 a map of pg_dump options to configure the behavior of the command itself. You can refer to the list of supported pg_dump options in the Greenmask dump command documentation.
transformation \u2014 this section contains configuration for applying transformations to table columns during the dump operation. It includes the following sub-parameters:
schema \u2014 the schema name of the table
name \u2014 the name of the table
subset_conds - list of the conditions to filter the rows to be dumped. The conditions are combined with AND operator. For details read Database subset
query \u2014 an optional parameter for specifying a custom query to be used in the COPY command. By default, the entire table is dumped, but you can use this parameter to set a custom query.
Warning
Be cautious when using the query parameter, as it may lead to constraint violation errors during restoration, and Greenmask currently cannot handle query validation.
columns_type_override \u2014 allows you to override the column types explicitly. You can associate a column with another type that is supported by your transformer. This is useful when the transformer works strictly with specific types of columns. For example, if a column named post_code is of the TEXT type, but the RandomInt transformer works only with INT family types, you can override it as shown in the example provided. column type overridden example
Change the data type of the post_code column to INT4 (INTEGER)
apply_for_inherited \u2014 an optional parameter to apply the same transformation to all partitions if the table is partitioned. This can save you from defining the transformation for each partition manually.
Warning
It is recommended to use the --load-via-partition-root parameter when dealing with partitioned tables, as the partition key value might change.
transformers \u2014 a list of transformers to apply to the table, along with their parameters. Each transformation item includes the following sub-parameters:
name \u2014 the name of the transformer
params \u2014 a map of the provided transformer parameters
Override the post_code column type to int4 (INTEGER). This is necessary because the post_code column originally has a TEXT type, but it contains values that resemble integers. By explicitly overriding the type to int4, we ensure compatibility with transformers that work with integer types, such as RandomInt.
After the type is overridden, we can apply a compatible transformer.
Database subset condition applied to the aircrafts_data table. The subset condition filters the data based on the model column.
In the validate section of the configuration, you can specify parameters for the greenmask validate command. Here is an example of the validate section configuration:
A list of tables to validate. If this list is not empty, the validation operation will only be performed for the specified tables. Tables can be written with or without the schema name (e. g., \"public.cart\" or \"orders\").
Specifies whether to perform data transformation for a limited set of rows. If set to true, data transformation will be performed, and the number of rows transformed will be limited to the value specified in the rows_limit parameter (default is 10).
Specifies whether to perform diff operations for the transformed data. If set to true, the validation process will find the differences between the original and transformed data. See more details in the validate command documentation.
Limits the number of rows to be transformed during validation. The default limit is 10 rows, but you can change it by modifying this parameter.
A hash list of resolved warnings. These warnings have been addressed and resolved in a previous validation run.
Specifies the format of the transformation output. Possible values are [horizontal|vertical]. The default format is horizontal. You can choose the format that suits your needs. See more details in the validate command documentation.
The output format (json or text)
Specifies whether to validate the schema current schema with the previous and print the differences if any.
If set to true, transformation output will be only with the transformed columns and primary keys
In the restore section of the configuration, you can specify parameters for the greenmask restore command. It contains pg_restore settings and custom script execution settings. Below you can find the available parameters:
pg_restore_options \u2014 a map of pg_restore options that are used to configure the behavior of the pg_restore utility during the restoration process. You can refer to the list of supported pg_restore options in the Greenmask restore command documentation.
scripts \u2014 a map of custom scripts to be executed during different restoration stages. Each script is associated with a specific restoration stage and includes the following attributes:
[pre-data|data|post-data] \u2014 the name of the restoration stage when the script should be executed; has the following parameters:
name \u2014 the name of the script
when \u2014 specifies when to execute the script, which can be either \"before\" or \"after\" the specified restoration stage
query \u2014 an SQL query string to be executed
query_file \u2014 the path to an SQL query file to be executed
command \u2014 a command with parameters to be executed. It is provided as a list, where the first item is the command name.
insert_error_exclusions \u2014 a list of error codes that should be ignored during the restoration process. This is useful when you want to skip specific errors that are not critical for the restoration process.
As mentioned in the architecture, a backup contains three sections: pre-data, data, and post-data. The custom script execution allows you to customize and control the restoration process by executing scripts or commands at specific stages. The available restoration stages and their corresponding execution conditions are as follows:
pre-data \u2014 scripts or commands can be executed before or after restoring the pre-data section
data \u2014 scripts or commands can be executed before or after restoring the data section
post-data \u2014 scripts or commands can be executed before or after restoring the post-data section
Each stage can have a \"when\" condition with one of the following possible values:
before \u2014 execute the script or SQL command before the mentioned restoration stage
after \u2014 execute the script or SQL command after the mentioned restoration stage
Below you can find one of the possible versions for the scripts part of the restore section:
scripts definition example
scripts:\n pre-data: # (1)\n - name: \"pre-data before script [1] with query\"\n when: \"before\"\n query: \"create table script_test(stage text)\"\n - name: \"pre-data before script [2]\"\n when: \"before\"\n query: \"insert into script_test values('pre-data before')\"\n - name: \"pre-data after test script [1]\"\n when: \"after\"\n query: \"insert into script_test values('pre-data after')\"\n - name: \"pre-data after script with query_file [1]\"\n when: \"after\"\n query_file: \"pre-data-after.sql\"\n data: # (2)\n - name: \"data before script with command [1]\"\n when: \"before\"\n command: # (4)\n - \"data-after.sh\"\n - \"param1\"\n - \"param2\"\n - name: \"data after script [1]\"\n when: \"after\"\n query_file: \"data-after.sql\"\n post-data: # (3)\n - name: \"post-data before script [1]\"\n when: \"before\"\n query: \"insert into script_test values('post-data before')\"\n - name: \"post-data after script with query_file [1]\"\n when: \"after\"\n query_file: \"post-data-after.sql\"\n
List of pre-data stage scripts. This section contains scripts that are executed before or after the restoration of the pre-data section. The scripts include SQL queries and query files.
List of data stage scripts. This section contains scripts that are executed before or after the restoration of the data section. The scripts include shell commands with parameters and SQL query files.
List of post-data stage scripts. This section contains scripts that are executed before or after the restoration of the post-data section. The scripts include SQL queries and query files.
Command in the first argument and the parameters in the rest of the list. When specifying a command to be executed in the scripts section, you provide the command name as the first item in a list, followed by any parameters or arguments for that command. The command and its parameters are provided as a list within the script configuration.
You can configure which errors to ignore during the restoration process by setting the insert_error_exclusions parameter. This parameter can be applied globally or per table. If both global and table-specific settings are defined, the table-specific settings will take precedence. Below is an example of how to configure the insert_error_exclusions parameter. You can specify constraint names from your database schema or the error codes returned by PostgreSQL. codes in the PostgreSQL documentation.
It's also possible to configure Greenmask through environment variables.
Greenmask will automatically parse any environment variable that matches the configuration in the config file by substituting the dot (.) separator for an underscore (_) and uppercasing it. As an example, the config file below would apply the same configuration as defining the LOG_LEVEL=debug environment variable
Additionaly, there are some environment variables exposed by the dump and restore commands to facilitate the connection configuration with a Postgres database
PGHOST - host used to connect to the postgres database
PGPORT - port where postgres is exposed
PGDATABASE - name of the database to dump/restore
PGUSER - username used to connect to the postgres database
PGPASSWORD - password used to authenticate to the postgres database
Greenmask allows you to define a subset condition for filtering data during the dump process. This feature is useful when you need to dump only a part of the database, such as a specific table or a set of tables. It automatically ensures data consistency by including all related data from other tables that are required to maintain the integrity of the subset. The subset condition can be defined using subset_conds attribute that can be defined on the table in the transformation section (see examples).
Info
Greenmask genrates queries for subset conditions based on the introspected schema using joins and recursive queries. It cannot be responsible for query optimization. The subset quries might be slow due to the complexity of the queries and/or lack of indexes. Circular are resolved using recursive queries.
The subset is a list of SQL conditions that are applied to table. The conditions are combined with AND operator. You need to specify the schema, table and column name when pointing out the column to filter by to avoid ambiguity. The subset condition must be a valid SQL condition.
Subset condition example
subset_conds:\n - 'person.businessentity.businessentityid IN (274, 290, 721, 852)'\n
Database scale down - create anonymized dump but for the limited and consistent set of tables
Data migration - migrate only some records from one database to another
Data anonymization - dump and anonymize only a specific records in the database
Database catchup - catchup your another instance of database logically by adding a new records. In this case it is recommended to restore tables in topological order using --restore-in-order.
"},{"location":"database_subset/#references-with-null-values","title":"References with NULL values","text":"
For references that do not have NOT NULL constraints, Greenmask will automatically generate LEFT JOIN queries with the appropriate conditions to ensure integrity checks. You can rely on Greenmask to handle such cases correctly\u2014no special configuration is needed, as it performs this automatically based on the introspected schema.
Greenmask supports circular references between tables. You can define a subset condition for any table, and Greenmask will automatically generate the appropriate queries for the table subset using recursive queries. The subset system ensures data consistency by validating all records found through the recursive queries. If a record does not meet the subset condition, it will be excluded along with its parent records, preventing constraint violations.
Warning
Currently (v0.2b2), Greenmask can resolve multi-cylces in one strogly connected component, but only for one group of vertexes. If you have SSC that contains 2 groups of vertexes, Greenmask will not be able to resolve it. For instance we have 2 cycles with tables A, B, C (first group) and B, C, E (second group). Greenmask will not be able to resolve it. But if you have only one group of vertexes one and more cycles in the same group of tables (for instance A, B, C), Greenmask works with it. This will be fixed in the future. See second example below. In practice this is quite rare situation and 99% of people will not face this issue.
You can read the Wikipedia article about Circular reference here.
During the development process, there are situations where foreign keys need to be removed. The reasons can vary\u2014from improving performance to simplifying the database structure. Additionally, some foreign keys may exist within loosely structured data, such as JSON, where PostgreSQL cannot create foreign keys at all. These limitations could significantly hinder the capabilities of a subset system. Greenmask offers a flexible solution to this problem by allowing the declaration of virtual references in the configuration, enabling the preservation and management of logical relationships between tables, even in the absence of explicit foreign keys. Virtual reference can be called virtual foreign key as well.
The virtual_references can be defined in dump section. It contains the list of virtual references. First you set the table where you want to define virtual reference. In the attribute references define the list of tables that are referenced by the table. In the columns attribute define the list of columns that are used in the foreign key reference. The not_null attribute is optional and defines if the FK has not null constraint. If true Greenmask will generate INNER JOIN instead of LEFT JOIN by default it is false. The expression needs to be used when you want to use some expression to get the value of the column in the referencing table. For instance, if you have JSONB column in the audit_logs table that contains order_id field, you can use this field as FK reference.
Info
You do not need to define primry key of the referenced table. Greenmask will automatically resolve it and use it in the join condition.
Greenmask supports polymorphic references. You can define a virtual reference for a table with polymorphic references using polymorphic_exprs attribute. The polymorphic_exprs attribute is a list of expressions that are used to make a polymorphic reference. For instance we might have a table comments that has polymorphic reference to posts and videos. The table comments might have commentable_id and commentable_type columns. The commentable_type column contains the type of the table that is referenced by the commentable_id column. The example of the config:
The plimorphic references cannot be non_null because the commentable_id column can be NULL if the commentable_type is not set or different that the values defined in the polymorphic_exprs attribute.
"},{"location":"database_subset/#troubleshooting","title":"Troubleshooting","text":""},{"location":"database_subset/#exclude-the-records-that-has-null-values-in-the-referenced-column","title":"Exclude the records that has NULL values in the referenced column","text":"
If you want to exclude records that have NULL values in the referenced column, you can manually add this condition to the subset condition for the table. Greenmask does not automatically exclude records with NULL values because it applies a LEFT OUTER JOIN on nullable foreign keys.
"},{"location":"database_subset/#some-table-is-not-filtered-by-the-subset-condition","title":"Some table is not filtered by the subset condition","text":"
Greenmask builds a table dependency graph based on the introspected schema and existing foreign keys. If a table is not filtered by the subset condition, it means that the table either does not reference another table that is filtered by the subset condition or the table itself does not have a subset condition applied.
If you have a table with a removed foreign key and want to filter it by the subset condition, you need to define a virtual reference. For more information on virtual references, refer to the Virtual References section.
Info
If you find any issues related to the code or greenmask is not working as expected, do not hesitate to contact us directly or by creating an issue in the repository.
"},{"location":"database_subset/#error-column-reference-id-is-ambiguous","title":"ERROR: column reference \"id\" is ambiguous","text":"
If you see the error message ERROR: column reference \"{column name}\" is ambiguous, you have specified the column name without the table and/or schema name. To avoid ambiguity, always specify the schema and table name when pointing out the column to filter by. For instance if you want to filter employees by employee_id column, you should use public.employees.employee_id instead of employee_id.
Valid subset condition
public.employees.employee_id IN (1, 2, 3)\n
"},{"location":"database_subset/#the-subset-condition-is-not-working-correctly-how-can-i-verify-it","title":"The subset condition is not working correctly. How can I verify it?","text":"
Run greenmask with --log-level=debug to see the generated SQL queries. You will find the generated SQL queries in the log output. Validate this query in your database client to ensure that the subset condition is working as expected.
For example:
$ greenmask dump --config config.yaml --log-level=debug\n\n2024-08-29T19:06:18+03:00 DBG internal/db/postgres/context/context.go:202 > Debug query Schema=person Table=businessentitycontact pid=1638339\n2024-08-29T19:06:18+03:00 DBG internal/db/postgres/context/context.go:203 > SELECT \"person\".\"businessentitycontact\".* FROM \"person\".\"businessentitycontact\" INNER JOIN \"person\".\"businessentity\" ON \"person\".\"businessentitycontact\".\"businessentityid\" = \"person\".\"businessentity\".\"businessentityid\" AND ( person.businessentity.businessentityid between 400 and 800 OR person.businessentity.businessentityid between 800 and 900 ) INNER JOIN \"person\".\"person\" ON \"person\".\"businessentitycontact\".\"personid\" = \"person\".\"person\".\"businessentityid\" WHERE TRUE AND ((\"person\".\"person\".\"businessentityid\") IN (SELECT \"person\".\"businessentity\".\"businessentityid\" FROM \"person\".\"businessentity\" WHERE ( ( person.businessentity.businessentityid between 400 and 800 OR person.businessentity.businessentityid between 800 and 900 ) )))\n pid=1638339\n
"},{"location":"database_subset/#dump-is-too-slow","title":"Dump is too slow","text":"
If the dump process is too slow the generated query might be too complex. In this case you can:
Check if the database has indexes on the columns used in the subset condition. Create them if possible.
Move database dumping on the replica to avoid the performance impact on the primary.
"},{"location":"database_subset/#example-dump-a-subset-of-the-database","title":"Example: Dump a subset of the database","text":"
Info
All examples based on playground database. Read more about the playground database in the Playground section.
The following example demonstrates how to dump a subset of the person schema. The subset condition is applied to the businessentity and password tables. The subset condition filters the data based on the businessentityid and passwordsalt columns, respectively.
"},{"location":"database_subset/#example-dump-a-subset-with-circular-reference","title":"Example: Dump a subset with circular reference","text":"Create tables with multi cyles
-- Step 1: Create tables without foreign keys\nDROP TABLE IF EXISTS employees CASCADE;\nCREATE TABLE employees\n(\n employee_id SERIAL PRIMARY KEY,\n name VARCHAR(100) NOT NULL,\n department_id INT -- Will reference departments(department_id)\n);\n\nDROP TABLE IF EXISTS departments CASCADE;\nCREATE TABLE departments\n(\n department_id SERIAL PRIMARY KEY,\n name VARCHAR(100) NOT NULL,\n project_id INT -- Will reference projects(project_id)\n);\n\nDROP TABLE IF EXISTS projects CASCADE;\nCREATE TABLE projects\n(\n project_id SERIAL PRIMARY KEY,\n name VARCHAR(100) NOT NULL,\n lead_employee_id INT, -- Will reference employees(employee_id)\n head_employee_id INT -- Will reference employees(employee_id)\n);\n\n-- Step 2: Alter tables to add foreign key constraints\nALTER TABLE employees\n ADD CONSTRAINT fk_department\n FOREIGN KEY (department_id) REFERENCES departments (department_id);\n\nALTER TABLE departments\n ADD CONSTRAINT fk_project\n FOREIGN KEY (project_id) REFERENCES projects (project_id);\n\nALTER TABLE projects\n ADD CONSTRAINT fk_lead_employee\n FOREIGN KEY (lead_employee_id) REFERENCES employees (employee_id);\n\nALTER TABLE projects\n ADD CONSTRAINT fk_lead_employee2\n FOREIGN KEY (head_employee_id) REFERENCES employees (employee_id);\n\n-- Insert projects\nINSERT INTO projects (name, lead_employee_id)\nSELECT 'Project ' || i, NULL\nFROM generate_series(1, 10) AS s(i);\n\n-- Insert departments\nINSERT INTO departments (name, project_id)\nSELECT 'Department ' || i, i\nFROM generate_series(1, 10) AS s(i);\n\n-- Insert employees and assign 10 of them as project leads\nINSERT INTO employees (name, department_id)\nSELECT 'Employee ' || i, (i / 10) + 1\nFROM generate_series(1, 99) AS s(i);\n\n-- Assign 10 employees as project leads\nUPDATE projects\nSET lead_employee_id = (SELECT employee_id\n FROM employees\n WHERE employees.department_id = projects.project_id\n LIMIT 1),\n head_employee_id = 3\nWHERE project_id <= 10;\n
But this will return empty result, because the subset condition is not met for all related tables because project with project_id=1 has reference to employee with employee_id=3 that is invalid for subset condition.
"},{"location":"database_subset/#example-dump-a-subset-with-virtual-references","title":"Example: Dump a subset with virtual references","text":"
In this example, we will create a subset of the tables with virtual references. The subset will include the orders table and its related tables customers and audit_logs. The orders table has a virtual reference to the customers table, and the audit_logs table has a virtual reference to the orders table.
Create tables with virtual references
-- Create customers table\nCREATE TABLE customers\n(\n customer_id SERIAL PRIMARY KEY,\n customer_name VARCHAR(100)\n);\n\n-- Create orders table\nCREATE TABLE orders\n(\n order_id SERIAL PRIMARY KEY,\n customer_id INT, -- This should reference customers.customer_id, but no FK constraint is defined\n order_date DATE\n);\n\n-- Create payments table\nCREATE TABLE payments\n(\n payment_id SERIAL PRIMARY KEY,\n order_id INT, -- This should reference orders.order_id, but no FK constraint is defined\n payment_amount DECIMAL(10, 2),\n payment_date DATE\n);\n\n-- Insert test data into customers table\nINSERT INTO customers (customer_name)\nVALUES ('John Doe'),\n ('Jane Smith'),\n ('Alice Johnson');\n\n-- Insert test data into orders table\nINSERT INTO orders (customer_id, order_date)\nVALUES (1, '2023-08-01'), -- Related to customer John Doe\n (2, '2023-08-05'), -- Related to customer Jane Smith\n (3, '2023-08-07');\n-- Related to customer Alice Johnson\n\n-- Insert test data into payments table\nINSERT INTO payments (order_id, payment_amount, payment_date)\nVALUES (1, 100.00, '2023-08-02'), -- Related to order 1 (John Doe's order)\n (2, 200.50, '2023-08-06'), -- Related to order 2 (Jane Smith's order)\n (3, 300.75, '2023-08-08');\n-- Related to order 3 (Alice Johnson's order)\n\n\n-- Create a table with a multi-key reference (composite key reference)\nCREATE TABLE order_items\n(\n order_id INT, -- Should logically reference orders.order_id\n item_id INT, -- Composite part of the key\n product_name VARCHAR(100),\n quantity INT,\n PRIMARY KEY (order_id, item_id) -- Composite primary key\n);\n\n-- Create a table with a JSONB column that contains a reference value\nCREATE TABLE audit_logs\n(\n log_id SERIAL PRIMARY KEY,\n log_data JSONB -- This JSONB field will contain references to other tables\n);\n\n-- Insert data into order_items table with multi-key reference\nINSERT INTO order_items (order_id, item_id, product_name, quantity)\nVALUES (1, 1, 'Product A', 3), -- Related to order_id = 1 from orders table\n (1, 2, 'Product B', 5), -- Related to order_id = 1 from orders table\n (2, 1, 'Product C', 2), -- Related to order_id = 2 from orders table\n (3, 1, 'Product D', 1);\n-- Related to order_id = 3 from orders table\n\n-- Insert data into audit_logs table with JSONB reference value\nINSERT INTO audit_logs (log_data)\nVALUES ('{\n \"event\": \"order_created\",\n \"order_id\": 1,\n \"details\": {\n \"customer_name\": \"John Doe\",\n \"total\": 100.00\n }\n}'),\n ('{\n \"event\": \"payment_received\",\n \"order_id\": 2,\n \"details\": {\n \"payment_amount\": 200.50,\n \"payment_date\": \"2023-08-06\"\n }\n }'),\n ('{\n \"event\": \"item_added\",\n \"order_id\": 1,\n \"item\": {\n \"item_id\": 2,\n \"product_name\": \"Product B\",\n \"quantity\": 5\n }\n }');\n
The following example demonstrates how to make a subset for keys that does not have FK constraints but a data relationship exists.
The orders table has a virtual reference to the customers table, and the audit_logs table has a virtual reference to the orders table.
The payments table has a virtual reference to the orders table.
The order_items table has two keys that reference the orders and products tables.
The audit_logs table has a JSONB column that contains two references to the orders and order_items tables.
As a result, the customers table will be dumped with the orders table and its related tables payments, order_items, and audit_logs. The subset condition will be applied to the customers table, and the data will be filtered based on the customer_id column.
"},{"location":"database_subset/#example-dump-a-subset-with-polymorphic-references","title":"Example: Dump a subset with polymorphic references","text":"
In this example, we will create a subset of the tables with polymorphic references. This example includes the comments table and its related tables posts and videos.
Create tables with polymorphic references and insert data
-- Create the Posts table\nCREATE TABLE posts\n(\n id SERIAL PRIMARY KEY,\n title VARCHAR(255) NOT NULL,\n content TEXT NOT NULL\n);\n\n-- Create the Videos table\nCREATE TABLE videos\n(\n id SERIAL PRIMARY KEY,\n title VARCHAR(255) NOT NULL,\n url VARCHAR(255) NOT NULL\n);\n\n-- Create the Comments table with a polymorphic reference\nCREATE TABLE comments\n(\n id SERIAL PRIMARY KEY,\n commentable_id INT NOT NULL, -- Will refer to either posts.id or videos.id\n commentable_type VARCHAR(50) NOT NULL, -- Will store the type of the associated record\n body TEXT NOT NULL,\n created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP\n);\n\n\n-- Insert data into the Posts table\nINSERT INTO posts (title, content)\nVALUES ('First Post', 'This is the content of the first post.'),\n ('Second Post', 'This is the content of the second post.');\n\n-- Insert data into the Videos table\nINSERT INTO videos (title, url)\nVALUES ('First Video', 'https://example.com/video1'),\n ('Second Video', 'https://example.com/video2');\n\n-- Insert data into the Comments table, associating some comments with posts and others with videos\n-- For posts:\nINSERT INTO comments (commentable_id, commentable_type, body)\nVALUES (1, 'post', 'This is a comment on the first post.'),\n (2, 'post', 'This is a comment on the second post.');\n\n-- For videos:\nINSERT INTO comments (commentable_id, commentable_type, body)\nVALUES (1, 'video', 'This is a comment on the first video.'),\n (2, 'video', 'This is a comment on the second video.');\n
The comments table has a polymorphic reference to the posts and videos tables. Depending on the value of the commentable_type column, the commentable_id column will reference either the posts.id or videos.id column.
The following example demonstrates how to make a subset for tables with polymorphic references.
This example selects only the first post from the posts table and its related comments from the comments table. The comments are associated with videos are included without filtering because the subset condition is applied only to the posts table and related comments.
The resulted records will be:
transformed=# select * from comments;\n id | commentable_id | commentable_type | body | created_at \n----+----------------+------------------+---------------------------------------+----------------------------\n 1 | 1 | post | This is a comment on the first post. | 2024-09-18 05:27:54.217405\n 2 | 2 | post | This is a comment on the second post. | 2024-09-18 05:27:54.217405\n 3 | 1 | video | This is a comment on the first video. | 2024-09-18 05:27:54.229794\n(3 rows)\n
Once the repository is cloned, execute the following command to build Greenmask:
make build\n
After completing the build process, you will find the binary named greenmask in the root directory of the repository. Execute the binary to start using Greenmask.
Greenmask Playground is a sandbox environment for your experiments in Docker with sample databases included to help you try Greenmask without any additional actions. Read the Playground guide to learn more.
Greenmask Playground is a sandbox environment in Docker with sample databases included to help you try Greenmask without any additional actions. It includes the following components:
Original database \u2014 the source database you'll be working with.
Empty database for restoration \u2014 an empty database where the restored data will be placed.
MinIO storage \u2014 used for storage purposes.
Greenmask Utility \u2014 Greenmask itself, ready for use.
Warning
To complete this guide, you must have Docker and docker-compose installed.
"},{"location":"playground/#setting-up-greenmask-playground","title":"Setting up Greenmask Playground","text":"
Clone the greenmask repository and navigate to its directory by running the following commands:
git clone git@github.com:GreenmaskIO/greenmask.git && cd greenmask\n
Once you have cloned the repository, start the environment by running Docker Compose:
docker-compose run greenmask\n
Tip
If you're experiencing problems with pulling images from Docker Hub, you can build the Greenmask image from source by running the following command:
docker-compose run greenmask-from-source\n
Now you have Greenmask Playground up and running with a shell prompt inside the container. All further operations will be carried out within this container's shell.
A configuration file is mandatory for Greenmask functioning. The pre-defined configuration file is stored at the repository root directory (./playground/config.yml). It also serves to define transformers which you can update to your liking in order to use Greenmask Playground more effectively and to get better understanding of the tool itself. To learn how to customize a configuration file, see Configuration
The pre-defined configuration file uses the NoiseDate transformer as an example. To learn more about other transformers and how to use them, see Transformers.
Most transformers in Greenmask have dynamic parameters. This functionality is possible because Greenmask utilizes a database driver that can encode and decode raw values into their actual type representations.
This allows you to retrieve parameter values directly from the records. This capability is particularly beneficial when you need to resolve functional dependencies between fields or satisfy constraints. Greenmask processes transformations sequentially. Therefore, when you reference a field that was transformed in a previous step, you will access the transformed value.
column - Specifies the column name. The value from each record in this column will be passed to the transformer as a parameter.
cast_to - Indicates the function used to cast the column value to the desired type. Before being passed to the transformer, the value is cast to this type. For more details, see Cast functions.
template - Defines the template used for casting the column value to the desired type. You can create your own template and incorporate predefined functions and operators to implement the casting logic or other logic required for passing the value to the transformer. For more details, see Template functions.
default_value - Determines the default value used if the column's value is NULL. This value is represented in raw format appropriate to the type specified in the column parameter.
"},{"location":"built_in_transformers/dynamic_parameters/#cast-functions","title":"Cast functions","text":"name description input type output type UnixNanoToDate Cast int value as Unix Timestamp in Nano Seconds to date type int2, int4, int8, numeric, float4, float8 date UnixMicroToDate Cast int value as Unix Timestamp in Micro Seconds to date type int2, int4, int8, numeric, float4, float8 date UnixMilliToDate Cast int value as Unix Timestamp in Milli Seconds to date type int2, int4, int8, numeric, float4, float8 date UnixSecToDate Cast int value as Unix Timestamp in Seconds to date type int2, int4, int8, numeric, float4, float8 date UnixNanoToTimestamp Cast int value as Unix Timestamp in Nano Seconds to timestamp type int2, int4, int8, numeric, float4, float8 timestamp UnixMicroToTimestamp Cast int value as Unix Timestamp in Micro Seconds to timestamp type int2, int4, int8, numeric, float4, float8 timestamp UnixMilliToTimestamp Cast int value as Unix Timestamp in Milli Seconds to timestamp type int2, int4, int8, numeric, float4, float8 timestamp UnixSecToTimestamp Cast int value as Unix Timestamp in Seconds to timestamp type int2, int4, int8, numeric, float4, float8 timestamp UnixNanoToTimestampTz Cast int value as Unix Timestamp in Nano Seconds to timestamptz type int2, int4, int8, numeric, float4, float8 timestamptz UnixMicroToTimestampTz Cast int value as Unix Timestamp in Micro Seconds to timestamptz type int2, int4, int8, numeric, float4, float8 timestamptz UnixMilliToTimestampTz Cast int value as Unix Timestamp in Milli Seconds to timestamptz type int2, int4, int8, numeric, float4, float8 timestamptz UnixSecToTimestampTz Cast int value as Unix Timestamp in Seconds to timestamptz type int2, int4, int8, numeric, float4, float8 timestamptz DateToUnixNano Cast date value to int value as a Unix Timestamp in Nano Seconds date int2, int4, int8, numeric, float4, float8 DateToUnixMicro Cast date value to int value as a Unix Timestamp in Micro Seconds date int2, int4, int8, numeric, float4, float8 DateToUnixMilli Cast date value to int value as a Unix Timestamp in Milli Seconds date int2, int4, int8, numeric, float4, float8 DateToUnixSec Cast date value to int value as a Unix Timestamp in Seconds date int2, int4, int8, numeric, float4, float8 TimestampToUnixNano Cast timestamp value to int value as a Unix Timestamp in Nano Seconds timestamp int2, int4, int8, numeric, float4, float8 TimestampToUnixMicro Cast timestamp value to int value as a Unix Timestamp in Micro Seconds timestamp int2, int4, int8, numeric, float4, float8 TimestampToUnixMilli Cast timestamp value to int value as a Unix Timestamp in Milli Seconds timestamp int2, int4, int8, numeric, float4, float8 TimestampToUnixSec Cast timestamp value to int value as a Unix Timestamp in Seconds timestamp int2, int4, int8, numeric, float4, float8 TimestampTzToUnixNano Cast timestamptz value to int value as a Unix Timestamp in Nano Seconds timestamptz int2, int4, int8, numeric, float4, float8 TimestampTzToUnixMicro Cast timestamptz value to int value as a Unix Timestamp in Micro Seconds timestamptz int2, int4, int8, numeric, float4, float8 TimestampTzToUnixMilli Cast timestamptz value to int value as a Unix Timestamp in Milli Seconds timestamptz int2, int4, int8, numeric, float4, float8 TimestampTzToUnixSec Cast timestamptz value to int value as a Unix Timestamp in Seconds timestamptz int2, int4, int8, numeric, float4, float8 FloatToInt Cast float value to one of integer type. The fractional part will be discarded numeric, float4, float8 int2, int4, int8, numeric IntToFloat Cast int value to one of integer type int2, int4, int8, numeric numeric, float4, float8 IntToBool Cast int value to boolean. The value with 0 is false, 1 is true int2, int4, int8, numeric, float4, float8 bool BoolToInt Cast boolean value to int. The value false is 0, true is 1 bool int2, int4, int8, numeric, float4, float8"},{"location":"built_in_transformers/dynamic_parameters/#example-functional-dependency-resolution-between-columns","title":"Example: Functional dependency resolution between columns","text":"
There is simplified schema of the table humanresources.employee from the playground:
Column | Type \n------------------+-----------------------------\n businessentityid | integer \n jobtitle | character varying(50) \n birthdate | date \n hiredate | date \nCheck constraints:\n CHECK (birthdate >= '1930-01-01'::date AND birthdate <= (now() - '18 years'::interval))\n
As you can see, there is a functional dependency between the birthdate and hiredate columns. Logically, the hiredate should be later than the birthdate. Additionally, the birthdate should range from 1930-01-01 to 18 years prior to the current date.
Imagine that you need to generate random birthdate and hiredate columns. To ensure these dates satisfy the constraints, you can use dynamic parameters in the RandomDate transformer:
Firstly we generate the RadnomDate for birthdate column. The result of the transformation will used as the minimum value for the next transformation for hiredate column.
Apply the template for static parameter. It calculates the now date and subtracts 30 years from it. The result is 1994. The function tsModify return not a raw data, but time.Time object. For getting the raw value suitable for birthdate type we need to pass this value to .EncodeValue function. This value is used as the minimum value for the birthdate column.
The same as the previous step, but we subtract 18 years from the now date. The result is 2002.
Generate the RadnomDate for hiredate column based on the value from the birthdate.
Set the maximum value for the hiredate column. The value is the current date.
The min parameter is set to the value of the birthdate column from the previous step.
The template gets the value of the randomly generated birthdate value and adds 18 years to it.
Below is the result of the transformation:
From the result, you can see that all functional dependencies and constraints are satisfied.
It is allowed to generate parameter values from templates. It is useful when you don't want to write values manually, but instead want to generate and initialize them dynamically.
Here you can find the list of template functions that can be used in the template Custom functions.
You can encode and decode objects using the driver function bellow.
"},{"location":"built_in_transformers/parameters_templating/#template-functions","title":"Template functions","text":"Function Description Signature .GetColumnType Returns a string with the column type. .GetColumnType(name string) (typeName string, err error).EncodeValueByColumn Encodes a value of any type into its raw string representation using the specified column name. Encoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .EncodeValueByColumn(name string, value any) (res any, err error).DecodeValueByColumn Decodes a value from its raw string representation to a Golang type using the specified column name. Decoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .DecodeValueByColumn(name string, value any) (res any, err error).EncodeValueByType Encodes a value of any type into its string representation using the specified type name. Encoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .EncodeValueByType(name string, value any) (res any, err error).DecodeValueByType Decodes a value from its raw string representation to a Golang type using the specified type name. Decoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .DecodeValueByType(name string, value any) (res any, err error).DecodeValue Decodes a value from its raw string representation to a Golang type using the data type assigned to the table column specified in the column parameter. Decoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .DecodeValueByColumn(value any) (res any, err error).EncodeValue Encodes a value of any type into its string representation using the type assigned to the table column specified in the column parameter. Encoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .EncodeValue(value any) (res any, err error)
Warning
If column parameter is not linked to column parameter, then functions .DecodeValue and .EncodeValue will return an error. You can use .DecodeValueByType and .EncodeValueByType or .DecodeValueByColumn and .EncodeValueByColumn instead.
In the example below, the min and max values for the birth_date column are generated dynamically using the now template function. The value returns the current date and time. The tsModify function is then used to subtract 30 (and 18) years. But because the parameter type is mapped on column parameter type, the EncodeValue function is used to encode the value into the column type.
For example, if we have the now date as 2021-01-01, the dynamically calculated min value will be 1994-01-01 and the max value will be 2006-01-01.
CREATE TABLE account\n(\n id SERIAL PRIMARY KEY,\n gender VARCHAR(1) NOT NULL,\n email TEXT NOT NULL NOT NULL UNIQUE,\n first_name TEXT NOT NULL,\n last_name TEXT NOT NULL,\n birth_date DATE,\n created_at TIMESTAMP NOT NULL DEFAULT NOW()\n);\n\nINSERT INTO account (first_name, gender, last_name, birth_date, email)\nVALUES ('John', 'M', 'Smith', '1980-01-01', 'john.smith@gmail.com');\n
The transformation condition feature allows you to execute a defined transformation only if a specified condition is met. The condition must be defined as a boolean expression that evaluates to true or false. Greenmask uses expr-lang/expr under the hood. You can use all functions and syntax provided by the expr library.
You can use the same functions that are described in the built-in transformers
The transformers are executed one by one - this helps you create complex transformation pipelines. For instance depending on value chosen in the previous transformer, you can decide to execute the next transformer or not.
To improve the user experience, Greenmask offers special namespaces for accessing values in different formats: either the driver-encoded value in its real type or as a raw string.
record: This namespace provides the record value in its actual type.
raw_record: This namespace provides the record value as a string.
You can access a specific column\u2019s value using record.column_name for the real type or raw_record.column_name for the raw string value.
Warning
A record may always be modified by previous transformers before the condition is evaluated. This means Greenmask does not retain the original record value and instead provides the current modified value for condition evaluation.
Expression scope can be on table or specific transformer. If you define the condition on the table scope, then the condition will be evaluated before any transformer is executed. If you define the condition on the transformer scope, then the condition will be evaluated before the specified transformer is executed.
"},{"location":"built_in_transformers/transformation_condition/#int-and-float-value-definition","title":"Int and float value definition","text":"
It is important to create the integer or float value in the correct format. If you want to define the integer value you must write a number without dot (1, 2, etc.). If you want to define the float value you must write a number with dot (1.0, 2.0, etc.).
Warning
You may see a wrong comparison result if you compare int and float, for example 1 == 1.0 will return false.
Greenmask encodes the way only when evaluating the condition - this allows to optimize the performance of the transformation if you have a lot of conditions that uses or (||) or and (&&) operators.
"},{"location":"built_in_transformers/transformation_condition/#example-chose-random-value-and-execute-one-of","title":"Example: Chose random value and execute one of","text":"
In the following example, the RandomChoice transformer is used to choose a random value from the list of values. Depending on the chosen value, the Replace transformer is executed to set the activeflag column to true or false.
In this case the condition scope is on the transformer level.
"},{"location":"built_in_transformers/transformation_condition/#example-do-not-transform-specific-columns","title":"Example: Do not transform specific columns","text":"
In the following example, the RandomString transformer is executed only if the businessentityid column value is not equal to 1492 or 1.
The greenmask provides two engines random and hash. Most of the transformers has engine parameters that by default is set to random. Use hash engine when you need to generate deterministic data - the same input will always produce the same output.
Info
Greenmask employs the SHA-3 algorithm to hash input values. While this function is cryptographically secure, it does exhibit lower performance. We plan to introduce additional hash functions in the future to offer a balance between security and performance. For example, SipHash, which provides a good trade-off between security and performance, is currently in development and is expected to be included in the stable v0.2 release of Greenmask.
Warning
The hash engine does not guarantee the uniqueness of generated values. Although transformers such as Hash, RandomEmail, and RandomUuid typically have a low probability of producing duplicate values The feature to ensure uniqueness is currently under development at Greenmask and is expected to be released in future updates. For the latest status, please visit the Greenmask roadmap.
The random engine serves as the default engine for the greenmask. It operates using a pseudo-random number generator, which is initialized with a random seed sourced from a cryptographically secure random number generator. Employ the random engine when you need to generate random data and do not require reproducibility of the same transformation results with the same input.
The following example demonstrates how to configure the RandomDate transformer to generate random.
Keep in mind that the random engine is always generates different values for the same input. For instance in we run the previous example multiple times we will get different results.
The hash engine is designed to generate deterministic data. It uses the SHA-3 algorithm to hash the input value. The hash engine is particularly useful when you need to generate the same output for the same input. For example, when you want to transform values that are used as primary or foreign keys in a database.
For secure reason it is suggested set global greenmask salt via GREENMASK_GLOBAL_SALT environment variable. The salt is added to the hash input to prevent the possibility of reverse engineering the original value from the hashed output. The value is hex encoded with variadic length. For example, GREENMASK_GLOBAL_SALT=a5eddc84e762e810. Generate a strong random salt and keep it secret.
The following example demonstrates how to configure the RandomInt transformer to generate deterministic data using the hash engine. The public.account.id and public.orders.account_id columns will have the same values.
If you have partitioned tables or want to apply a transformation to a primary key and propagate it to all tables referencing that column, you can do so with Greenmask.
"},{"location":"built_in_transformers/transformation_inheritance/#apply-for-inherited","title":"Apply for inherited","text":"
Using apply_for_inherited, you can apply transformations to all partitions of a partitioned table, including any subpartitions.
When a partition has a transformation defined manually via config, and apply_for_inherited is set on the parent table, Greenmask will merge both the inherited and manually defined configurations. The manually defined transformation will execute last, giving it higher priority.
If this situation occurs, you will see the following information in the log:
{\n \"level\": \"info\",\n \"ParentTableSchema\": \"public\",\n \"ParentTableName\": \"sales\",\n \"ChildTableSchema\": \"public\",\n \"ChildTableName\": \"sales_2022_feb\",\n \"ChildTableConfig\": [\n {\n \"name\": \"RandomDate\",\n \"params\": {\n \"column\": \"sale_date\",\n \"engine\": \"random\",\n \"max\": \"2005-01-01\",\n \"min\": \"2001-01-01\"\n }\n }\n ],\n \"time\": \"2024-11-03T22:14:01+02:00\",\n \"message\": \"config will be merged: found manually defined transformers on the partitioned table\"\n}\n
"},{"location":"built_in_transformers/transformation_inheritance/#apply-for-references","title":"Apply for references","text":"
Using apply_for_references, you can apply transformations to columns involved in a primary key or in tables with a foreign key that references that column. This simplifies the transformation process by requiring you to define the transformation only on the primary key column, which will then be applied to all tables referencing that column.
The transformer must be deterministic or support hash engine and the hash engin must be set in the configuration file.
List of transformers that supports apply_for_references:
End-to-end identifiers in databases are unique identifiers that are consistently used across multiple tables in a relational database schema, allowing for a seamless chain of references from one table to another. These identifiers typically serve as primary keys in one table and are propagated as foreign keys in other tables, creating a direct, traceable link from one end of a data relationship to the other.
Greenmask can detect end-to-end identifiers and apply transformations across the entire sequence of tables. These identifiers are detected when the following condition is met: the foreign key serves as both a primary key and a foreign key in the referenced table.
When on the referenced column a transformation is manually defined via config, and the apply_for_references is set on parent table, the transformation defined will be chosen and the inherited transformation will be ignored. You will receive a INFO message in the logs.
The transformation condition will not be applied to the referenced column.
Not all transformers support apply_for_references
Warning
We do not recommend using apply_for_references with transformation conditions, as these conditions are not inherited by transformers on the referenced columns. This may lead to inconsistencies in the data.
In this example, we have a partitioned table sales that is partitioned by year and then by month. Each partition contains a subset of data based on the year and month of the sale. The sales table has a primary key sale_id and is partitioned by sale_date. The sale_date column is transformed using the RandomDate transformer.
CREATE TABLE sales\n(\n sale_id SERIAL NOT NULL,\n sale_date DATE NOT NULL,\n amount NUMERIC(10, 2) NOT NULL\n) PARTITION BY RANGE (EXTRACT(YEAR FROM sale_date));\n\n-- Step 2: Create first-level partitions by year\nCREATE TABLE sales_2022 PARTITION OF sales\n FOR VALUES FROM (2022) TO (2023)\n PARTITION BY LIST (EXTRACT(MONTH FROM sale_date));\n\nCREATE TABLE sales_2023 PARTITION OF sales\n FOR VALUES FROM (2023) TO (2024)\n PARTITION BY LIST (EXTRACT(MONTH FROM sale_date));\n\n-- Step 3: Create second-level partitions by month for each year, adding PRIMARY KEY on each partition\n\n-- Monthly partitions for 2022\nCREATE TABLE sales_2022_jan PARTITION OF sales_2022 FOR VALUES IN (1)\n WITH (fillfactor = 70);\nCREATE TABLE sales_2022_feb PARTITION OF sales_2022 FOR VALUES IN (2);\nCREATE TABLE sales_2022_mar PARTITION OF sales_2022 FOR VALUES IN (3);\n-- Continue adding monthly partitions for 2022...\n\n-- Monthly partitions for 2023\nCREATE TABLE sales_2023_jan PARTITION OF sales_2023 FOR VALUES IN (1);\nCREATE TABLE sales_2023_feb PARTITION OF sales_2023 FOR VALUES IN (2);\nCREATE TABLE sales_2023_mar PARTITION OF sales_2023 FOR VALUES IN (3);\n-- Continue adding monthly partitions for 2023...\n\n-- Step 4: Insert sample data\nINSERT INTO sales (sale_date, amount)\nVALUES ('2022-01-15', 100.00);\nINSERT INTO sales (sale_date, amount)\nVALUES ('2022-02-20', 150.00);\nINSERT INTO sales (sale_date, amount)\nVALUES ('2023-03-10', 200.00);\n
To transform the sale_date column in the sales table and all its partitions, you can use the following configuration:
This is ordinary table references where the primary key of the users table is referenced in the orders table.
-- Enable the extension for UUID generation (if not enabled)\nCREATE EXTENSION IF NOT EXISTS \"uuid-ossp\";\n\nCREATE TABLE users\n(\n user_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),\n username VARCHAR(50) NOT NULL\n);\n\nCREATE TABLE orders\n(\n order_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),\n user_id UUID REFERENCES users (user_id),\n order_date DATE NOT NULL\n);\n\nINSERT INTO users (username)\nVALUES ('john_doe');\nINSERT INTO users (username)\nVALUES ('jane_smith');\n\nINSERT INTO orders (user_id, order_date)\nVALUES ((SELECT user_id FROM users WHERE username = 'john_doe'), '2024-10-31'),\n ((SELECT user_id FROM users WHERE username = 'jane_smith'), '2024-10-30');\n
To transform the username column in the users table, you can use the following configuration:
This will apply the RandomUuid transformation to the user_id column in the orders table automatically.
"},{"location":"built_in_transformers/transformation_inheritance/#example-3-references-on-tables-with-end-to-end-identifiers","title":"Example 3. References on tables with end-to-end identifiers","text":"
In this example, we have three tables: tablea, tableb, and tablec. All tables have a composite primary key. In the tables tableb and tablec, the primary key is also a foreign key that references the primary key of tablea. This means that all PKs are end-to-end identifiers.
Change a JSON document using delete and set operations. NULL values are kept.
"},{"location":"built_in_transformers/advanced_transformers/json/#parameters","title":"Parameters","text":"Name Properties Description Default Required Supported DB types column The name of the column to be affected Yes json, jsonb operations A list of operations that contains editing delete and set Yes - \u221f operation Specifies the operation type: set or delete Yes - \u221f path The path to an object to be modified. See path syntax below. Yes - \u221f value A value to be assigned to the provided path No - \u221f value_template A Golang template to be assigned to the provided path. See the list of template functions below. No - \u221f error_not_exist Throws an error if the key does not exist by the provided path. Disabled by default. false No -"},{"location":"built_in_transformers/advanced_transformers/json/#description","title":"Description","text":"
The Json transformer applies a sequence of changing operations (set and/or delete) to a JSON document. The value can be static or dynamic. For the set operation type, a static value is provided in the value parameter, while a dynamic value is provided in the value_template parameter, taking the data received after template execution as a result. Both the value and value_template parameters are mandatory for the set operation.
The Json transformer is based on tidwall/sjson and supports the same path syntax. See their documentation for syntax rules.
"},{"location":"built_in_transformers/advanced_transformers/json/#template-functions","title":"Template functions","text":"Function Description Signature .GetPath Returns the current path to which the operation is being applied .GetPath() (path string).GetOriginalValue Returns the original value to which the current operation path is pointing. If the value at the specified path does not exist, it returns nil. .GetOriginalValue() (value any).OriginalValueExists Returns a boolean value indicating whether the specified path exists or not. .OriginalValueExists() (exists bool).GetColumnValue Returns an encoded into Golang type value for a specified column or throws an error. A value can be any of int, float, time, string, bool, or slice or map. .GetColumnValue(name string) (value any, err error).GetRawColumnValue Returns a raw value for a specified column as a string or throws an error .GetRawColumnValue(name string) (value string, err error).EncodeValueByColumn Encodes a value of any type into its raw string representation using the specified column name. Encoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .EncodeValueByColumn(name string, value any) (res any, err error).DecodeValueByColumn Decodes a value from its raw string representation to a Golang type using the specified column name. Decoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .DecodeValueByColumn(name string, value any) (res any, err error).EncodeValueByType Encodes a value of any type into its string representation using the specified type name. Encoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .EncodeValueByType(name string, value any) (res any, err error).DecodeValueByType Decodes a value from its raw string representation to a Golang type using the specified type name. Decoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .DecodeValueByType(name string, value any) (res any, err error)"},{"location":"built_in_transformers/advanced_transformers/json/#example-changing-json-document","title":"Example: Changing JSON document","text":"Json transformer example
Execute a Go template and automatically apply the result to a specified column.
"},{"location":"built_in_transformers/advanced_transformers/template/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes any template A Go template string Yes - validate Validates the template result using the PostgreSQL driver decoding procedure. Throws an error if a custom type does not have an encode-decoder implementation. false No -"},{"location":"built_in_transformers/advanced_transformers/template/#description","title":"Description","text":"
The Template transformer executes Go templates and automatically applies the template result to a specified column. Go template system is designed to be extensible, enabling developers to access data objects and incorporate custom functions programmatically. For more information, you can refer to the official Go Template documentation.
With the Template transformer, you can implement complicated transformation logic using basic or custom template functions. Below you can get familiar with the basic template functions for the Template transformer. For more information about available custom template functions, see Custom functions.
Warning
Pay attention to the whitespaces in templates. Use dash-wrapped - brackets {{- -}} for trimming the spaces. For example, the value \"2023-12-19\" is not the same as \" 2023-12-19 \" and it may throw an error when restoring.
"},{"location":"built_in_transformers/advanced_transformers/template/#template-functions","title":"Template functions","text":"Function Description Signature .GetColumnType Returns a string with the column type. .GetColumnType(name string) (typeName string, err error).GetValue Returns the column value for column assigned in the column parameter, encoded by the PostgreSQL driver into any type along with any associated error. Supported types include int, float, time, string, bool, as well as slice or map of any type. .GetValue() (value any, err error).GetRawValue Returns a raw value as a string for column assigned in the column parameter. .GetRawColumnValue(name string) (value string, err error).GetColumnValue Returns an encoded value for a specified column or throws an error. A value can be any of int, float, time, string, bool, or slice or map. .GetColumnValue(name string) (value any, err error).GetRawColumnValue Returns a raw value for a specified column as a string or throws an error .GetRawColumnValue(name string) (value string, err error).EncodeValue Encodes a value of any type into its string representation using the type assigned to the table column specified in the column parameter. Encoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .EncodeValue(value any) (res any, err error).DecodeValue Decodes a value from its raw string representation to a Golang type using the data type assigned to the table column specified in the column parameter. Decoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .DecodeValueByColumn(value any) (res any, err error).EncodeValueByColumn Encodes a value of any type into its raw string representation using the specified column name. Encoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .EncodeValueByColumn(name string, value any) (res any, err error).DecodeValueByColumn Decodes a value from its raw string representation to a Golang type using the specified column name. Decoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .DecodeValueByColumn(name string, value any) (res any, err error).EncodeValueByType Encodes a value of any type into its string representation using the specified type name. Encoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .EncodeValueByType(name string, value any) (res any, err error).DecodeValueByType Decodes a value from its raw string representation to a Golang type using the specified type name. Decoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .DecodeValueByType(name string, value any) (res any, err error)"},{"location":"built_in_transformers/advanced_transformers/template/#example-update-the-firstname-column","title":"Example: Update the firstname column","text":"
Value = TerryValue != Terri column name original value transformed firstname Terri Mary column name original value transformed firstname Ken Jr Mike"},{"location":"built_in_transformers/advanced_transformers/template_record/","title":"TemplateRecord","text":"
Modify records using a Go template and apply changes by using the PostgreSQL driver functions. This transformer provides a way to implement custom transformation logic.
"},{"location":"built_in_transformers/advanced_transformers/template_record/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types columns A list of columns to be affected by the template. The list of columns will be checked for constraint violations. No any template A Go template string Yes - validate Validate the template result via PostgreSQL driver decoding procedure. Throws an error if a custom type does not have an encode-decoder implementation. false No -"},{"location":"built_in_transformers/advanced_transformers/template_record/#description","title":"Description","text":"
TemplateRecord uses Go templates to change data. However, while the Template transformer operates with a single column and automatically applies results, the TemplateRecord transformer can make changes to a set of columns in the string, and using driver functions .SetValue or .SetRawValue is mandatory to do that.
With the TemplateRecord transformer, you can implement complicated transformation logic using basic or custom template functions. Below you can get familiar with the basic template functions for the TemplateRecord transformer. For more information about available custom template functions, see Custom functions.
"},{"location":"built_in_transformers/advanced_transformers/template_record/#template-functions","title":"Template functions","text":"Function Description Signature .GetColumnType Returns a string with the column type. .GetColumnType(name string) (typeName string, err error).GetColumnValue Returns an encoded value for a specified column or throws an error. A value can be any of int, float, time, string, bool, or slice or map. .GetColumnValue(name string) (value any, err error).GetRawColumnValue Returns a raw value for a specified column as a string or throws an error .GetRawColumnValue(name string) (value string, err error).SetColumnValue Sets a new value of a specific data type to the column. The value assigned must be compatible with the PostgreSQL data type of the column. For example, it is allowed to assign an int value to an INTEGER column, but you cannot assign a float value to a timestamptz column. SetColumnValue(name string, v any) (bool, error).SetRawColumnValue Sets a new raw value for a column, inheriting the column's existing data type, without performing data type validation. This can lead to errors when restoring the dump if the assigned value is not compatible with the column type. To ensure compatibility, consider using the .DecodeValueByColumn function followed by .SetColumnValue, for example, {{ \"13\" \\| .DecodeValueByColumn \"items_amount\" \\| .SetColumnValue \"items_amount\" }}. .SetRawColumnValue(name string, value any) (err error).EncodeValueByColumn Encodes a value of any type into its raw string representation using the specified column name. Encoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .EncodeValueByColumn(name string, value any) (res any, err error).DecodeValueByColumn Decodes a value from its raw string representation to a Golang type using the specified column name. Decoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .DecodeValueByColumn(name string, value any) (res any, err error).EncodeValueByType Encodes a value of any type into its string representation using the specified type name. Encoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .EncodeValueByType(name string, value any) (res any, err error).DecodeValueByType Decodes a value from its raw string representation to a Golang type using the specified type name. Decoding is performed through the PostgreSQL driver. Throws an error if types are incompatible. .DecodeValueByType(name string, value any) (res any, err error)"},{"location":"built_in_transformers/advanced_transformers/template_record/#example-generate-a-random-created_at-and-updated_at-dates","title":"Example: Generate a random created_at and updated_at dates","text":"
Below you can see the table structure:
The goal is to modify the \"created_at\" and \"updated_at\" columns based on the following rules:
Do not change the value if the created_at is Null.
If the created_at is not Null, generate the current time and use it as the minimum threshold for randomly generating the updated_at value.
Assign all generated values using the .SetColumnValue function.
column name original value transformed created_at 2021-01-20 07:01:00.513325+00 2023-12-17 19:37:29.910054Z updated_at 2021-08-09 21:27:00.513325+00 2023-12-18 10:05:25.828498Z"},{"location":"built_in_transformers/advanced_transformers/custom_functions/","title":"Template custom functions","text":"
Within Greenmask, custom functions play a crucial role, providing a wide array of options for implementing diverse logic. Under the hood, the custom functions are based on the sprig Go's template functions. Greenmask enhances this capability by introducing additional functions and transformation functions. These extensions mirror the logic found in the standard transformers but offer you the flexibility to implement intricate and comprehensive logic tailored to your specific needs.
Currently, you can use template custom functions for the advanced transformers:
Json
Template
TemplateRecord
and for the Transformation condition feature as well.
Custom functions are arbitrarily divided into 2 groups:
Core functions \u2014 custom functions that vary in purpose and include PostgreSQL driver, JSON output, testing, and transformation functions.
Faker functions \u2014 custom function of a faker type which generate synthetic data.
Below you can find custom core functions which are divided into categories based on the transformation purpose.
"},{"location":"built_in_transformers/advanced_transformers/custom_functions/core_functions/#postgresql-driver-functions","title":"PostgreSQL driver functions","text":"Function Description null Returns the NULL value that can be used for the driver encoding-decoding operations isNull Returns true if the checked value is NULLisNotNull Returns true if the checked value is not NULLsqlCoalesce Works as a standard SQL coalesce function. It allows you to choose the first non-NULL argument from the list."},{"location":"built_in_transformers/advanced_transformers/custom_functions/core_functions/#json-output-function","title":"JSON output function","text":"Function Description jsonExists Checks if the path value exists in JSON. Returns true if the path exists. mustJsonGet Gets the JSON attribute value by path and throws an error if the path does not exist mustJsonGetRaw Gets the JSON attribute raw value by path and throws an error if the path does not exist jsonGet Gets the JSON attribute value by path and returns nil if the path does not exist jsonGetRaw Gets the JSON attribute raw value by path and returns nil if the path does not exist jsonSet Sets the value for the JSON document by path jsonSetRaw Sets the raw value for the JSON document by path jsonDelete Deletes an attribute from the JSON document by path jsonValidate Validates the JSON document syntax and throws an error if there are any issues jsonIsValid Checks the JSON document for validity and returns true if it is valid toJsonRawValue Casts any type of value to the raw JSON value"},{"location":"built_in_transformers/advanced_transformers/custom_functions/core_functions/#testing-functions","title":"Testing functions","text":"Function Description isInt Checks if the value of an integer type isFloat Checks if the value of a float type isNil Checks if the value is nil isString Checks if the value of a string type isMap Checks if the value of a map type isSlice Checks if the value of a slice type isBool Checks if the value of a boolean type"},{"location":"built_in_transformers/advanced_transformers/custom_functions/core_functions/#transformation-and-generators","title":"Transformation and generators","text":""},{"location":"built_in_transformers/advanced_transformers/custom_functions/core_functions/#masking","title":"masking","text":"
Replaces characters with asterisk * symbols depending on the provided masking rule. If the value is NULL, it is kept unchanged. This function is based on ggwhite/go-masker.
Masking rulesSignatureParametersReturn values Rule Description Example input Example output default Returns the sequence of * symbols of the same length test1234********name Masks the second and the third letters ABCDA**Dpassword Always returns a sequence of *address Keeps first 6 letters, masks the rest Larnaca, makarios stLarnac*************email Keeps a domain and the first 3 letters, masks the rest ggw.chang@gmail.comggw****@gmail.commobile Masks 3 digits starting from the 4th digit 09876543210987***321telephone Removes (, ), , - symbols, masks last 4 digits of a telephone number, and formats it to (??)????-????0227993078(02)2799-****id Masks last 4 digits of an ID A123456789A12345****credit_card Masks 6 digits starting from the 7th digit 1234567890123456123456******3456url Masks the password part of the URL (if applicable) http://admin:mysecretpassword@localhost:1234/urihttp://admin:xxxxx@localhost:1234/uri
masking(dataType string, value string) (res string, err error)
dataType \u2014 one of the masking rules (see previous tab)
Adds or subtracts a random fraction to or from the original float value. Multiplies the original float value by a provided random value that is not higher than the ratio parameter and adds it to the original value with the option to specify the decimal via the decimal parameter.
SignatureParametersReturn values
noiseFloat(ratio float, decimal int, value float) (res float64, err error)
ratio \u2014 the maximum multiplier value in the interval (0:1). The value will be randomly generated up to ratio, multiplied by the original value, and the result will be added to the original value.
Adds or subtracts a random fraction to or from the original integer value. Multiplies the original integer value by a provided random value that is not higher than the ratio parameter and adds it to the original value.
SignatureParametersReturn values
noiseInt(ratio float, value float) (res int, err error)
ratio \u2014 the max multiplier value in the interval (0:1). The value will be generated randomly up to ratio, multiplied by the original value, and the result will be added to the original value.
Greenmask uses go-faker/faker under the hood for generating of synthetic data.
"},{"location":"built_in_transformers/advanced_transformers/custom_functions/faker_function/#faker-functions-address","title":"Faker functions: Address","text":"Function Description Signature fakerRealAddress Generates a random real-world address that includes: city, state, postal code, latitude, and longitude fakerRealAddress() (res ReadAddress)fakerLatitude Generates random fake latitude fakerLatitude() (res float64)fakerLongitude Generates random fake longitude fakerLongitude() (res float64)"},{"location":"built_in_transformers/advanced_transformers/custom_functions/faker_function/#faker-functions-datetime","title":"Faker functions: Datetime","text":"Function Description Signature fakerUnixTime Generates random Unix time in seconds fakerLongitude() (res int64)fakerDate Generates random date with the pattern of YYYY-MM-DDfakerDate() (res string)fakerTimeString Generates random time fakerTimeString() (res string)fakerMonthName Generates a random month fakerMonthName() (res string)fakerYearString Generates a random year fakerYearString() (res string)fakerDayOfWeek Generates a random day of a week fakerDayOfWeek() (res string)fakerDayOfMonth Generates a random day of a month fakerDayOfMonth() (res string)fakerTimestamp Generates a random timestamp with the pattern of YYYY-MM-DD HH:MM:SSfakerTimestamp() (res string)fakerCentury Generates a random century fakerCentury() (res string)fakerTimezone Generates a random timezone name fakerTimezone() (res string)fakerTimeperiod Generates a random time period with the patter of either AM or PMfakerTimeperiod() (res string)"},{"location":"built_in_transformers/advanced_transformers/custom_functions/faker_function/#faker-functions-internet","title":"Faker functions: Internet","text":"Function Description Signature fakerEmail Generates a random email fakerEmail() (res string)fakerMacAddress Generates a random MAC address fakerMacAddress() (res string)fakerDomainName Generates a random domain name fakerDomainName() (res string)fakerURL Generates a random URL with the pattern of https://www.domainname.some/somepathfakerURL() (res string)fakerUsername Generates a random username fakerUsername() (res string)fakerIPv4 Generates a random IPv4 address fakerIPv4() (res string)fakerIPv6 Generates a random IPv6 address fakerIPv6() (res string)fakerPassword Generates a random password fakerPassword() (res string)"},{"location":"built_in_transformers/advanced_transformers/custom_functions/faker_function/#faker-functions-words-and-sentences","title":"Faker functions: words and sentences","text":"Function Description Signature fakerWord Generates a random word fakerWord() (res string)fakerSentence Generates a random sentence fakerSentence() (res string)fakerParagraph Generates a random sequence of sentences as a paragraph fakerParagraph() (res string)"},{"location":"built_in_transformers/advanced_transformers/custom_functions/faker_function/#faker-functions-payment","title":"Faker functions: Payment","text":"Function Description Signature fakerCCType Generates a random credit card type, e.g. VISA, MasterCard, etc. fakerCCType() (res string)fakerCCNumber Generates a random credit card number fakerCCNumber() (res string)fakerCurrency Generates a random currency name fakerCurrency() (res string)fakerAmountWithCurrency Generates random amount preceded with random currency fakerAmountWithCurrency() (res string)"},{"location":"built_in_transformers/advanced_transformers/custom_functions/faker_function/#faker-functions-person","title":"Faker functions: Person","text":"Function Description Signature fakerTitleMale Generates a random male title from the predefined list fakerTitleMale() (res string)fakerTitleFemale Generates a random female title from the predefined list fakerTitleFemale() (res string)fakerFirstName Generates a random first name fakerFirstName() (res string)fakerFirstNameMale Generates a random male first name fakerFirstNameMale() (res string)fakerFirstNameFemale Generates a random female first name fakerFirstNameFemale() (res string)fakerFirstLastName Generates a random last name fakerFirstLastName() (res string)fakerName Generates a random full name preceded with a title fakerName() (res string)"},{"location":"built_in_transformers/advanced_transformers/custom_functions/faker_function/#faker-functions-phone","title":"Faker functions: Phone","text":"Function Description Signature fakerPhoneNumber Generates a random phone number fakerPhoneNumber() (res string)fakerTollFreePhoneNumber Generates a random phone number with the pattern of (123) 456-7890fakerTollFreePhoneNumber() (res string)fakerE164PhoneNumber Generates a random phone number with the pattern of +12345678900fakerE164PhoneNumber() (res string)"},{"location":"built_in_transformers/advanced_transformers/custom_functions/faker_function/#faker-functions-uuid","title":"Faker functions: UUID","text":"Function Description Signature fakerUUIDHyphenated Generates a random unique user ID separated by hyphens fakerUUID() (res string)fakerUUIDDigit Generates a random unique user ID in the HEX format fakerUUIDDigit() (res string)"},{"location":"built_in_transformers/standard_transformers/","title":"Standard transformers","text":"
Standard transformers are ready-to-use methods that require no customization and perform with just as little as parameters input. Below you can find an index of all standard transformers currently available in Greenmask.
Cmd \u2014 transforms data via external program using stdin and stdout interaction.
Dict \u2014 replaces values matched by dictionary keys.
Hash \u2014 generates a hash of the text value.
Masking \u2014 masks a value using one of the masking behaviors depending on your domain.
NoiseDate \u2014 randomly adds or subtracts a duration within the provided ratio interval to the original date value.
NoiseFloat \u2014 adds or subtracts a random fraction to the original float value.terval to the original date value.
NoiseNumeric \u2014 adds or subtracts a random fraction to the original numeric value.
NoiseInt \u2014 adds or subtracts a random fraction to the original integer value.
RandomBool \u2014 generates random boolean values.
RandomChoice \u2014 replaces values randomly chosen from a provided list.
RandomDate \u2014 generates a random date in a specified interval.
RandomFloat \u2014 generates a random float within the provided interval.
RandomInt \u2014 generates a random integer within the provided interval.
RandomString \u2014 generates a random string using the provided characters within the specified length range.
RandomUuid \u2014 generates a random unique user ID.
RandomLatitude \u2014 generates a random latitude value.
RandomLongitude \u2014 generates a random longitude value.
RandomUnixTimestamp \u2014 generates a random Unix timestamp.
RandomDayOfWeek \u2014 generates a random day of the week.
RandomDayOfMonth \u2014 generates a random day of the month.
RandomMonthName \u2014 generates the name of a random month.
RandomYearString \u2014 generates a random year as a string.
RandomCentury \u2014 generates a random century.
RandomTimezone \u2014 generates a random timezone.
RandomEmail \u2014 generates a random email address.
RandomUsername \u2014 generates a random username.
RandomPassword \u2014 generates a random password.
RandomDomainName \u2014 generates a random domain name.
RandomURL \u2014 generates a random URL.
RandomMac \u2014 generates a random MAC addresses.
RandomIP \u2014 generates a random IPv4 or IPv6 addresses.
RandomWord \u2014 generates a random word.
RandomSentence \u2014 generates a random sentence.
RandomParagraph \u2014 generates a random paragraph.
RandomCCType \u2014 generates a random credit card type.
RandomCCNumber \u2014 generates a random credit card number.
RandomCurrency \u2014 generates a random currency code.
RandomAmountWithCurrency \u2014 generates a random monetary amount with currency.
RandomPerson \u2014 generates a random person data (first name, last name, etc.)
RandomPhoneNumber \u2014 generates a random phone number.
RandomTollFreePhoneNumber \u2014 generates a random toll-free phone number.
RandomE164PhoneNumber \u2014 generates a random phone number in E.164 format.
RealAddress \u2014 generates a real address.
RegexpReplace \u2014 replaces a string using a regular expression.
Replace \u2014 replaces an original value by the provided one.
Transform data via external program using stdin and stdout interaction.
"},{"location":"built_in_transformers/standard_transformers/cmd/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types columns A list of column names to be affected. If empty, the entire tuple is used. Read about the structure further. Yes Any executable The path to the executable parameter file Yes - args A list of parameters for the executable No - driver The row driver with parameters that is used for interacting with cmd. See details below. {\"name\": \"csv\"} No - validate Performs a decoding operation using the PostgreSQL driver for data received from the command to ensure the data format is correct false No - timeout Timeout for sending and receiving data from the external command 2s No - expected_exit_code The expected exit code on SIGTERM signal. If the exit code is unexpected, the transformation exits with an error. 0 No - skip_on_behaviour Skips transformation call if one of the provided columns has a null value (any) or each of the provided columns has null values (all). This option works together with the skip_on_null_input parameter on columns. Possible values: all, any. all No -
Warning
The parameter validate_output=true may cause an error if the type does not have a PostgreSQL driver decoder implementation. Most of the types, such as int, float, text, varchar, date, timestamp, etc., have encoders and decoders, as well as inherited types like domain types based on them.
The Cmd transformer allows you to send original data to an external program via stdin and receive transformed data from stdout. It supports various interaction formats such as json, csv, or plain text for one-column transformations. The interaction is performed line by line, so at the end of each sent data, a new line symbol \\n must be included.
"},{"location":"built_in_transformers/standard_transformers/cmd/#types-of-interaction-modes","title":"Types of interaction modes","text":""},{"location":"built_in_transformers/standard_transformers/cmd/#text","title":"text","text":"
Textual driver that is used only for one column transformation, thus you cannot provide here more than one column. The value encodes into string laterally. For example, 2023-01-03 01:00:00.0+03.
JSON line driver. It has two formats that can be passed through driver.json_data_format: [text|bytes]. Use the bytes format for binary datatypes. Use the text format for non-binary datatypes and for those that can be represented as string literals. The default json_data_format is text.
Each line is a JSON line with a map of attribute numbers to their values
d \u2014 the raw data represented as base64 encoding for the bytes format or Unicode text for the text format. The base64 encoding is needed because data can be binary.
CSV driver (comma-separated). The number of attributes is the same as the number of table columns, but the columns that were not mentioned in the columns list are empty. The NULL value is represented as \\N. Each attribute is escaped by a quote (\"). For example, if the transformed table has attributes id, title, and created_at, and only id and created_at require transformation, then the CSV line will look as follows:
name \u2014 the name of the column. This value is required. Depending on the attributes that follows further, this column may be used just as a value and is not affected in any way.
not_affected \u2014 indicates whether the column is affected in the transformation. This attribute is required for the validation procedure when Greenmask is called with greenmask dump --validate. Setting not_affected=true can be helpful when the command transformer transforms data depending on the value of another column. For example, if you want to generate an updated_at column value depending on the created_at column value, you can set created_at to not_affected=true. The default value is false.
skip_original_data \u2014 indicates whether the original data is required for the transformer. This attribute can be helpful for decreasing the interaction time. One use case is when the command works as a generator and returns the value without relying on the original data. The default value is false.
skip_on_null_input \u2014 specifies whether to skip transformation when the original value is null. This attribute works in conjunction with the skip_on_behaviour parameter. For example, if you have two affected columns with skip_on_null_input=true and one column is null, then, if skip_on_behaviour=any, the transformation will be skipped, or, if skip_on_behaviour=and, the transformation will be performed. The default is false.
"},{"location":"built_in_transformers/standard_transformers/cmd/#example-apply-transformation-performed-by-external-command-in-text-format","title":"Example: Apply transformation performed by external command in TEXT format","text":"
In the following example, jobtitle columns is transformed via external command transformer.
External transformer in python example
#!/usr/bin/env python3\nimport signal\nimport sys\n\nsignal.signal(signal.SIGTERM, lambda sig, frame: exit(0))\n\n\n# If we want to implement a simple generator, we need read the line from stdin and write any result to stdout\nfor _ in sys.stdin:\n # Writing the result to stdout with new line and flushing the buffer\n sys.stdout.write(\"New Job Title\")\n sys.stdout.write(\"\\n\")\n sys.stdout.flush()\n
"},{"location":"built_in_transformers/standard_transformers/cmd/#example-apply-transformation-performed-by-external-command-in-json-format","title":"Example: Apply transformation performed by external command in JSON format","text":"
In the following example, jobtitle and loginid columns are transformed via external command transformer.
External transformer in python example
#!/usr/bin/env python3\nimport json\nimport signal\nimport sys\n\nsignal.signal(signal.SIGTERM, lambda sig, frame: exit(0))\n\nfor line in sys.stdin:\n res = json.loads(line)\n # Setting dummy values\n res[\"jobtitle\"] = {\"d\": \"New Job Title\", \"n\": False}\n res[\"loginid\"][\"d\"] = \"123\"\n\n # Writing the result to stdout with new line and flushing the buffer\n sys.stdout.write(json.dumps(res))\n sys.stdout.write(\"\\n\")\n sys.stdout.flush()\n
Validate the received data via decode procedure using the PostgreSQL driver. Note that this may cause an error if the type is not supported in the PostgreSQL driver.
Skip transformation (keep the values) if one of the affected columns (not_affected=false) has a null value.
If a column has a null value, then skip it. This works in conjunction with skip_on_behaviour. Since it has the value any, if one of the columns (jobtitle or loginid) has a null value, then skip the transformation call.
The format of JSON can be either text or bytes. The default value is text.
The skip_original_data attribute is set to true the date will not be transfered to the command. This column will contain the empty original data
"},{"location":"built_in_transformers/standard_transformers/dict/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes any values Value replace mapping as in: {\"string\": \"string\"}. The string with value \"\\N\" is considered NULL. No - default Shown if no value has been matched with dict. The string with value \"\\N\" is considered NULL. By default is empty. No - fail_not_matched When no value is matched with the dict, fails the replacement process if set to true, or keeps the current value, if set to false. true No - validate Performs the encode-decode procedure using column type to ensure that values have correct type true No -"},{"location":"built_in_transformers/standard_transformers/dict/#description","title":"Description","text":"
The Dict transformer uses a user-provided key-value dictionary to replace values based on matches specified in the values parameter mapping. These provided values must align with the PostgreSQL type format. To validate the values format before application, you can utilize the validate parameter, triggering a decoding procedure via the PostgreSQL driver.
If there are no matches by key, an error will be raised according to a default fail_not_matched: true parameter. You can change this behaviour by providing the default parameter, value from which will be shown in case of a missing match.
In certain cases where the driver type does not support the validation operation, an error may occur. For setting or matching a NULL value, use a string with the \\N sequence.
Generate a hash of the text value using the Scrypt hash function under the hood. NULL values are kept.
"},{"location":"built_in_transformers/standard_transformers/hash/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar salt Hex encoded salt string. This value may be provided via environment variable GREENMASK_GLOBAL_SALT Yes text, varchar function Hash algorithm to anonymize data. Can be any of md5, sha1, sha256, sha512, sha3-224, sha3-254, sha3-384, sha3-512. sha1 No - max_length Indicates whether to truncate the hash tail and specifies at what length. Can be any integer number, where 0 means \"no truncation\". 0 No -"},{"location":"built_in_transformers/standard_transformers/hash/#example-generate-hash-from-job-title","title":"Example: Generate hash from job title","text":"
The following example generates a hash from the jobtitle into sha1 and truncates the results after the 10th character.
We can set the salt via the environment variable GREENMASK_GLOBAL_SALT:
| column name | original value | transformed |\n|-------------|----------------------------------|-------------|\n| jobtitle | Research and Development Manager | 3a456da5c5 |\n
Mask a value using one of the masking rules depending on your domain. NULL values are kept.
"},{"location":"built_in_transformers/standard_transformers/masking/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar type Data type of attribute (default, password, name, addr, email, mobile, tel, id, credit, url) default No -"},{"location":"built_in_transformers/standard_transformers/masking/#description","title":"Description","text":"
The Masking transformer replaces characters with asterisk * symbols depending on the provided data type. If the value is NULL, it is kept unchanged. It is based on ggwhite/go-masker and supports the following masking rules:
Type Description default Returns * symbols with the same length, e.g. input: test1234 output: ******** name Masks the second letter the third letter in a word, e. g. input: ABCD output: A**D password Always returns ************ address Keeps first 6 letters, masks the rest, e. g. input: Larnaca, makarios st output: Larnac************* email Keeps a domain and the first 3 letters, masks the rest, e. g. input: ggw.chang@gmail.com output: ggw****@gmail.com mobile Masks 3 digits starting from the 4th digit, e. g. input: 0987654321 output: 0987***321 telephone Removes (, ), , - chart, and masks last 4 digits of telephone number, then formats it to (??)????-????, e. g. input: 0227993078 output: (02)2799-**** id Masks last 4 digits of ID number, e. g. input: A123456789 output: A12345**** credit_cart Masks 6 digits starting from the 7th digit, e. g. input 1234567890123456 output 123456******3456 url Masks the password part of the URL, if applicable, e. g. http://admin:mysecretpassword@localhost:1234/uri output: http://admin:xxxxx@localhost:1234/uri"},{"location":"built_in_transformers/standard_transformers/masking/#example-masking-employee-national-id-number","title":"Example: Masking employee national ID number","text":"
In the following example, the national ID number of an employee is masked.
Randomly add or subtract a duration within the provided ratio interval to the original date value.
"},{"location":"built_in_transformers/standard_transformers/noise_date/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes date, timestamp, timestamptz min_ratio The minimum random value for noise. The value must be in PostgreSQL interval format, e. g. 1 year 2 mons 3 day 04:05:06.07 5% from max_ration parameter No - max_ratio The maximum random value for noise. The value must be in PostgreSQL interval format, e. g. 1 year 2 mons 3 day 04:05:06.07 Yes - min Min threshold date (and/or time) of value. The value has the same format as column parameter No - max Max threshold date (and/or time) of value. The value has the same format as column parameter No - truncate Truncate the date to the specified part (nanosecond, microsecond, millisecond, second, minute, hour, day, month, year). The truncate operation is not applied by default. No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/noise_date/#dynamic-parameters","title":"Dynamic parameters","text":"Parameter Supported types min date, timestamp, timestamptz max date, timestamp, timestamptz"},{"location":"built_in_transformers/standard_transformers/noise_date/#description","title":"Description","text":"
The NoiseDate transformer randomly generates duration between min_ratio and max_ratio parameter and adds it to or subtracts it from the original date value. The min_ratio or max_ratio parameters must be written in the PostgreSQL interval format. You can also truncate the resulted date up to a specified part by setting the truncate parameter.
In case you have constraints on the date range, you can set the min and max parameters to specify the threshold values. The values for min and max must have the same format as the column parameter. Parameters min and max support dynamic mode.
Info
If the noised value exceeds the max threshold, the transformer will set the value to max. If the noised value is lower than the min threshold, the transformer will set the value to min.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/noise_date/#example-adding-noise-to-the-modified-date","title":"Example: Adding noise to the modified date","text":"
In the following example, the original timestamp value of modifieddate will be noised up to 1 year 2 months 3 days 4 hours 5 minutes 6 seconds and 7 milliseconds with truncation up to the month part.
NoiseDate transformer example
- schema: \"humanresources\"\n name: \"jobcandidate\"\n transformers:\n - name: \"NoiseDate\"\n params:\n column: \"hiredate\"\n max_ratio: \"1 year 2 mons 3 day 04:05:06.07\"\n truncate: \"month\"\n max: \"2020-01-01 00:00:00\"\n
"},{"location":"built_in_transformers/standard_transformers/noise_date/#example-adding-noise-to-the-modified-date-with-dynamic-min-parameter-with-hash-engine","title":"Example: Adding noise to the modified date with dynamic min parameter with hash engine","text":"
In the following example, the original timestamp value of hiredate will be noised up to 1 year 2 months 3 days 4 hours 5 minutes 6 seconds and 7 milliseconds with truncation up to the month part. The max threshold is set to 2020-01-01 00:00:00, and the min threshold is set to the birthdate column. If the birthdate column is NULL, the default value 1990-01-01 will be used. The hash engine is used for deterministic generation - the same input will always produce the same output.
Add or subtract a random fraction to the original float value.
"},{"location":"built_in_transformers/standard_transformers/noise_float/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes float4, float8 decimal The decimal of the noised float value (number of digits after the decimal point) 4 No - min_ratio The minimum random percentage for noise, from 0 to 1, e. g. 0.1 means \"add noise up to 10%\" 0.05 No - max_ratio The maximum random percentage for noise, from 0 to 1, e. g. 0.1 means \"add noise up to 10%\" Yes - min Min threshold of noised value No - max Max threshold of noised value No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/noise_float/#dynamic-parameters","title":"Dynamic parameters","text":"Parameter Supported types min float4, float8, int2, int4, int8 max float4, float8, int2, int4, int8"},{"location":"built_in_transformers/standard_transformers/noise_float/#description","title":"Description","text":"
The NoiseFloat transformer multiplies the original float value by randomly generated value that is not higher than the max_ratio parameter and not less that max_ratio parameter and adds it to or subtracts it from the original value. Additionally, you can specify the number of decimal digits by using the decimal parameter.
In case you have constraints on the float range, you can set the min and max parameters to specify the threshold values. The values for min and max must have the same format as the column parameter. Parameters min and max support dynamic mode. Engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines
Info
If the noised value exceeds the max threshold, the transformer will set the value to max. If the noised value is lower than the min threshold, the transformer will set the value to min.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/noise_float/#example-adding-noise-to-the-purchase-price","title":"Example: Adding noise to the purchase price","text":"
In this example, the original value of standardprice will be noised up to 50% and rounded up to 2 decimals.
Add or subtract a random fraction to the original integer value.
"},{"location":"built_in_transformers/standard_transformers/noise_int/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes int2, int4, int8 min_ratio The minimum random percentage for noise, from 0 to 1, e. g. 0.1 means \"add noise up to 10%\" 0.05 No - max_ratio The maximum random percentage for noise, from 0 to 1, e. g. 0.1 means \"add noise up to 10%\" Yes - min Min threshold of noised value No - max Min threshold of noised value No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/noise_int/#dynamic-parameters","title":"Dynamic parameters","text":"Parameter Supported types min int2, int4, int8 max int2, int4, int8"},{"location":"built_in_transformers/standard_transformers/noise_int/#description","title":"Description","text":"
The NoiseInt transformer multiplies the original integer value by randomly generated value that is not higher than the max_ratio parameter and not less that max_ratio parameter and adds it to or subtracts it from the original value.
In case you have constraints on the integer range, you can set the min and max parameters to specify the threshold values. The values for min and max must have the same format as the column parameter. Parameters min and max support dynamic mode.
Info
If the noised value exceeds the max threshold, the transformer will set the value to max. If the noised value is lower than the min threshold, the transformer will set the value to min.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/noise_int/#example-noise-vacation-hours-of-an-employee","title":"Example: Noise vacation hours of an employee","text":"
In the following example, the original value of vacationhours will be noised up to 40%. The transformer will set the value to 10 if the noised value is lower than 10 and to 1000 if the noised value exceeds 1000.
Add or subtract a random fraction to the original numeric value.
"},{"location":"built_in_transformers/standard_transformers/noise_numeric/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes numeric, decimal decimal The decimal of the noised float value (number of digits after the decimal point) 4 No - min_ratio The minimum random percentage for noise, from 0 to 1, e. g. 0.1 means \"add noise up to 10%\" 0.05 No - max_ratio The maximum random percentage for noise, from 0 to 1, e. g. 0.1 means \"add noise up to 10%\" Yes - min Min threshold of noised value No - max Max threshold of noised value No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/noise_numeric/#dynamic-parameters","title":"Dynamic parameters","text":"Parameter Supported types min numeric, decimal, float4, float8, int2, int4, int8 max numeric, decimal, float4, float8, int2, int4, int8"},{"location":"built_in_transformers/standard_transformers/noise_numeric/#description","title":"Description","text":"
The NoiseNumeric transformer multiplies the original numeric (or decimal) value by randomly generated value that is not higher than the max_ratio parameter and not less that max_ratio parameter and adds it to or subtracts it from the original value. Additionally, you can specify the number of decimal digits by using the decimal parameter.
In case you have constraints on the numeric range, you can set the min and max parameters to specify the threshold values. The values for min and max must have the same format as the column parameter. Parameters min and max support dynamic mode. Engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines
Info
If the noised value exceeds the max threshold, the transformer will set the value to max. If the noised value is lower than the min threshold, the transformer will set the value to min.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
Warning
Greenmask cannot parse the numeric type sitteng. For instance NUMERIC(10, 2). You should set min and max treshholds manually as well as allowed decimal. This behaviour will be changed in the later versions. Grenmask will be able to determine the decimal and scale of the column and set the min and max treshholds automatically if were not set.
"},{"location":"built_in_transformers/standard_transformers/noise_numeric/#example-adding-noise-to-the-purchase-price","title":"Example: Adding noise to the purchase price","text":"
In this example, the original value of standardprice will be noised up to 50% and rounded up to 2 decimals.
The RandomAmountWithCurrency transformer is specifically designed to populate specified database columns with random financial amounts accompanied by currency codes. Ideal for applications requiring the simulation of financial transactions, this utility enhances the realism of financial datasets by introducing variability in amounts and currencies.
"},{"location":"built_in_transformers/standard_transformers/random_amount_with_currency/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_amount_with_currency/#description","title":"Description","text":"
This transformer automatically generates random financial amounts along with corresponding global currency codes (e. g., 250.00 USD, 300.00 EUR), injecting them into the designated database column. It provides a straightforward solution for populating financial records with varied and realistic data, suitable for testing payment systems, data anonymization, and simulation of economic models.
"},{"location":"built_in_transformers/standard_transformers/random_amount_with_currency/#example-populate-the-payments-table-with-random-amounts-and-currencies","title":"Example: Populate the payments table with random amounts and currencies","text":"
This example shows how to configure the RandomAmountWithCurrency transformer to populate the payment_details column in the payments table with random amounts and currencies. It is an effective approach to simulating a diverse range of payment transactions.
In this setup, the payment_details column will be updated with random financial amounts and currency codes for each entry, replacing any existing non-NULL values. The keep_null parameter, when set to true, ensures that existing NULL values in the column remain unchanged, preserving the integrity of records without specified payment details.
"},{"location":"built_in_transformers/standard_transformers/random_bool/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes bool keep_null Indicates whether NULL values should be replaced with transformed values or not true No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_bool/#description","title":"Description","text":"
The RandomBool transformer generates a random boolean value. The behaviour for NULL values can be configured using the keep_null parameter. The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/random_bool/#example-generate-a-random-boolean-for-a-column","title":"Example: Generate a random boolean for a column","text":"
In the following example, the RandomBool transformer generates a random boolean value for the salariedflag column.
The RandomCCNumber transformer is specifically designed to populate specified database columns with random credit card numbers. This utility is crucial for applications that involve simulating financial data, testing payment systems, or anonymizing real credit card numbers in datasets.
"},{"location":"built_in_transformers/standard_transformers/random_cc_number/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_cc_number/#description","title":"Description","text":"
By leveraging algorithms capable of generating plausible credit card numbers that adhere to standard credit card validation rules (such as the Luhn algorithm), the RandomCCNumber transformer injects random credit card numbers into the designated database column. This approach ensures the generation of credit card numbers that are realistic for testing and development purposes, without compromising real-world applicability and security.
"},{"location":"built_in_transformers/standard_transformers/random_cc_number/#example-populate-random-credit-card-numbers-for-the-payment_information-table","title":"Example: Populate random credit card numbers for the payment_information table","text":"
This example demonstrates configuring the RandomCCNumber transformer to populate the cc_number column in the payment_information table with random credit card numbers. It is an effective strategy for creating a realistic set of payment data for application testing or data anonymization.
With this setup, the cc_number column will be updated with random credit card numbers for each entry, replacing any existing non-NULL values. If the keep_null parameter is set to true, it will ensure that existing NULL values in the column are preserved, maintaining the integrity of records where credit card information is not applicable or available.
The RandomCCType transformer is designed to populate specified database columns with random credit card types. This tool is essential for applications that require the simulation of financial transaction data, testing payment processing systems, or anonymizing credit card type information in datasets.
"},{"location":"built_in_transformers/standard_transformers/random_cc_type/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_cc_type/#description","title":"Description","text":"
Utilizing a predefined list of credit card types (e.g., VISA, MasterCard, American Express, Discover), the RandomCCType transformer injects random credit card type names into the designated database column. This feature allows for the creation of realistic and varied financial transaction datasets by simulating a range of credit card types without using real card data.
"},{"location":"built_in_transformers/standard_transformers/random_cc_type/#example-populate-random-credit-card-types-for-the-transactions-table","title":"Example: Populate random credit card types for the transactions table","text":"
This example shows how to configure the RandomCCType transformer to populate the card_type column in the transactions table with random credit card types. It is a straightforward method for simulating diverse payment methods across transactions.
In this configuration, the card_type column will be updated with random credit card types for each entry, replacing any existing non-NULL values. If the keep_null parameter is set to true, existing NULL values in the column will be preserved, maintaining the integrity of records where card type information is not applicable.
The RandomCentury transformer is crafted to populate specified database columns with random century values. It is ideal for applications that require historical data simulation, such as generating random years within specific centuries for historical databases, testing datasets with temporal dimensions, or anonymizing dates in historical research data.
"},{"location":"built_in_transformers/standard_transformers/random_century/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_century/#description","title":"Description","text":"
The RandomCentury transformer utilizes an algorithm or a library function (hypothetical in this context) to generate random century values. Each value represents a century (e.g., 19th, 20th, 21st), providing a broad temporal range that can be used to enhance datasets requiring a distribution across different historical periods without the need for precise date information.
"},{"location":"built_in_transformers/standard_transformers/random_century/#example-populate-random-centuries-for-the-historical_artifacts-table","title":"Example: Populate random centuries for the historical_artifacts table","text":"
This example shows how to configure the RandomCentury transformer to populate the century column in a historical_artifacts table with random century values, adding an element of variability and historical context to the dataset.
In this setup, the century column will be filled with random century values, replacing any existing non-NULL values. If the keep_null parameter is set to true, then existing NULL values in the column will remain untouched, preserving the original dataset's integrity where no temporal data is available.
Replace values randomly chosen from a provided list.
"},{"location":"built_in_transformers/standard_transformers/random_choice/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes any values A list of values in any format. The string with value \\N is considered NULL. Yes - validate Performs a decoding procedure via the PostgreSQL driver using the column type to ensure that values have correct type true No keep_null Indicates whether NULL values should be replaced with transformed values or not true No engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_choice/#description","title":"Description","text":"
The RandomChoice transformer replaces one randomly chosen value from the list provided in the values parameter. You can use the validate parameter to ensure that values are correct before applying the transformation. The behaviour for NULL values can be configured using the keep_null parameter.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/random_choice/#example-choosing-randomly-from-provided-dates","title":"Example: Choosing randomly from provided dates","text":"
In this example, the provided values undergo validation through PostgreSQL driver decoding, and one value is randomly chosen from the list.
The RandomCurrency transformer is tailored to populate specified database columns with random currency codes. This tool is highly beneficial for applications involving the simulation of international financial data, testing currency conversion features, or anonymizing currency information in datasets.
"},{"location":"built_in_transformers/standard_transformers/random_currency/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_currency/#description","title":"Description","text":"
Utilizing a comprehensive list of global currency codes (e.g., USD, EUR, JPY), the RandomCurrency transformer injects random currency codes into the designated database column. This feature allows for the creation of diverse and realistic financial transaction datasets by simulating a variety of currencies without relying on actual financial data.
"},{"location":"built_in_transformers/standard_transformers/random_currency/#example-populate-random-currency-codes-for-the-transactions-table","title":"Example: Populate random currency codes for the transactions table","text":"
This example outlines configuring the RandomCurrency transformer to populate the currency_code column in a transactions table with random currency codes. It is an effective way to simulate international transactions across multiple currencies.
In this configuration, the currency_code column will be updated with random currency codes for each entry, replacing any existing non-NULL values. If the keep_null parameter is set to true, existing NULL values in the column will be preserved, ensuring the integrity of records where currency data may not be applicable.
"},{"location":"built_in_transformers/standard_transformers/random_date/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column Name of the column to be affected Yes date, timestamp, timestamptz min The minimum threshold date for the random value. The format depends on the column type. Yes - max The maximum threshold date for the random value. The format depends on the column type. Yes - truncate Truncate the date to the specified part (nanosecond, microsecond, millisecond, second, minute, hour, day, month, year). The truncate operation is not applied by default. No - keep_null Indicates whether NULL values should be replaced with transformed values or not true No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_date/#dynamic-parameters","title":"Dynamic parameters","text":"Parameter Supported types min date, timestamp, timestamptz max date, timestamp, timestamptz"},{"location":"built_in_transformers/standard_transformers/random_date/#description","title":"Description","text":"
The RandomDate transformer generates a random date within the provided interval, starting from min to max. It can also perform date truncation up to the specified part of the date. The format of dates in the min and max parameters must adhere to PostgreSQL types, including DATE, TIMESTAMP WITHOUT TIMEZONE, or TIMESTAMP WITH TIMEZONE.
Note
The value of min and max parameters depends on the column type. For example, for the date column, the value should be in the format YYYY-MM-DD, while for the timestamp column, the value should be in the format YYYY-MM-DD HH:MM:SS or YYYY-MM-DD HH:MM:SS.SSSSSS. The timestamptz column requires the value to be in the format YYYY-MM-DD HH:MM:SS.SSSSSS+HH:MM. Read more about date/time formats in the PostgreSQL documentation.
The behaviour for NULL values can be configured using the keep_null parameter. The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
In the following example, a random timestamp without timezone is generated for the modifieddate column within the range from 2011-05-31 00:00:00 to 2013-05-31 00:00:00, and the part of the random value after day is truncated.
ColumnOriginalValueTransformedValue modifieddate2014-06-30 00:00:002012-07-27 00:00:00"},{"location":"built_in_transformers/standard_transformers/random_date/#example-generate-hiredate-based-on-birthdate-using-two-transformations","title":"Example: Generate hiredate based on birthdate using two transformations","text":"
In this example, the RandomDate transformer generates a random date for the birthdate column within the range now - 50 years to now - 18 years. The hire date is generated based on the birthdate, ensuring that the employee is at least 18 years old when hired.
The RandomDayOfMonth transformer is designed to populate specified database columns with random day-of-the-month values. It is particularly useful for scenarios requiring the simulation of dates, such as generating random event dates, user sign-up dates, or any situation where the specific day of the month is needed without reference to the actual month or year.
"},{"location":"built_in_transformers/standard_transformers/random_day_of_month/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar, int2, int4, int8, numeric keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_day_of_month/#description","title":"Description","text":"
Utilizing the faker library, the RandomDayOfMonth transformer generates random numerical values representing days of the month, ranging from 1 to 31. This allows for the easy insertion of random but plausible day-of-the-month data into a database, enhancing realism or anonymizing actual dates.
"},{"location":"built_in_transformers/standard_transformers/random_day_of_month/#example-populate-random-days-of-the-month-for-the-events-table","title":"Example: Populate random days of the month for the events table","text":"
This example illustrates how to configure the RandomDayOfMonth transformer to fill the event_day column in the events table with random day-of-the-month values, facilitating the simulation of varied event scheduling.
With this setup, the event_day column will be updated with random day-of-the-month values, replacing any existing non-NULL values. Setting keep_null to true ensures that NULL values in the column are left unchanged, maintaining any existing gaps in the data.
The RandomDayOfWeek transformer is specifically designed to fill specified database columns with random day-of-the-week names. It is particularly useful for applications that require simulated weekly schedules, random event planning, or any scenario where the day of the week is relevant but the specific date is not.
"},{"location":"built_in_transformers/standard_transformers/random_day_of_week/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_day_of_week/#description","title":"Description","text":"
Utilizing the faker library, the RandomDayOfWeek transformer generates names of days (e. g., Monday, Tuesday) at random. This transformer can be applied to any text or varchar column in a database, introducing variability and realism into data sets that need to represent days of the week in a non-specific manner.
"},{"location":"built_in_transformers/standard_transformers/random_day_of_week/#example-populate-random-days-of-the-week-for-the-work_schedule-table","title":"Example: Populate random days of the week for the work_schedule table","text":"
This example demonstrates configuring the RandomDayOfWeek transformer to populate the work_day column in the work_schedule table with random days of the week. This setup can help simulate a diverse range of work schedules without tying them to specific dates.
In this configuration, every entry in the work_day column will be updated with a random day of the week, replacing any existing non-NULL values. If the keep_null parameter is set to true, then existing NULL values within the column will remain unchanged.
The RandomDomainName transformer is designed to populate specified database columns with random domain names. This tool is invaluable for simulating web data, testing applications that interact with domain names, or anonymizing real domain information in datasets.
"},{"location":"built_in_transformers/standard_transformers/random_domain_name/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_domain_name/#description","title":"Description","text":"
By leveraging an algorithm or library capable of generating believable domain names, the RandomDomainName transformer introduces random domain names into the specified database column. Each generated domain name includes a second-level domain (SLD) and a top-level domain (TLD), such as \"example.com\" or \"website.org,\" providing a wide range of plausible web addresses for database enrichment.
"},{"location":"built_in_transformers/standard_transformers/random_domain_name/#example-populate-random-domain-names-for-the-websites-table","title":"Example: Populate random domain names for the websites table","text":"
This example demonstrates configuring the RandomDomainName transformer to populate the domain column in the websites table with random domain names. This approach facilitates the creation of a diverse and realistic set of web addresses for testing, simulation, or data anonymization purposes.
In this setup, the domain column will be updated with random domain names for each entry, replacing any existing non-NULL values. If keep_null is set to true, the transformer will preserve existing NULL values in the column, maintaining the integrity of data where domain information is not applicable.
The RandomE164PhoneNumber transformer is developed to populate specified database columns with random E.164 phone numbers. This tool is essential for applications requiring the simulation of contact information, testing phone number validation systems, or anonymizing phone number data in datasets while focusing on E.164 numbers.
"},{"location":"built_in_transformers/standard_transformers/random_e164_phone_number/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_e164_phone_number/#description","title":"Description","text":"
The RandomE164PhoneNumber transformer utilizes algorithms capable of generating random E.164 phone numbers with the standard international format and injects them into the designated database column. This feature allows for the creation of diverse and realistic contact information in datasets for development, testing, or data anonymization purposes.
"},{"location":"built_in_transformers/standard_transformers/random_e164_phone_number/#example-populate-random-e164-phone-numbers-for-the-contact_information-table","title":"Example: Populate random E.164 phone numbers for the contact_information table","text":"
This example demonstrates configuring the RandomE164PhoneNumber transformer to populate the phone_number column in the contact_information table with random E.164 phone numbers. It is an effective method for simulating a variety of contact information entries with E.164 numbers.
In this configuration, the phone_number column will be updated with random E.164 phone numbers for each contact information entry, replacing any existing non-NULL values. If the keep_null parameter is set to true, existing NULL values in the column will be preserved, ensuring the integrity of records where E.164 phone number information is not applicable or provided.
"},{"location":"built_in_transformers/standard_transformers/random_email/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_original_domain Keep original of the original address false No - local_part_template The template for local part of email No - domain_part_template The template for domain part of email No - domains List of domains for new email [\"gmail.com\", \"yahoo.com\", \"outlook.com\", \"hotmail.com\", \"aol.com\", \"icloud.com\", \"mail.com\", \"zoho.com\", \"yandex.com\", \"protonmail.com\", \"gmx.com\", \"fastmail.com\"] No - validate Validate generated email if using template false No - max_random_length Max length of randomly generated part of the email 32 No - keep_null Indicates whether NULL values should be preserved false No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_email/#description","title":"Description","text":"
The RandomEmail transformer generates random email addresses for the specified database column. By default, the transformer generates random email addresses with a maximum length of 32 characters. The keep_original_domain parameter allows you to preserve the original domain part of the email address. The local_part_template and domain_part_template parameters enable you to specify templates for the local and domain parts of the email address, respectively. If the validate parameter is set to true, the transformer will validate the generated email addresses against the specified templates. The keep_null parameter allows you to preserve existing NULL values in the column.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
In each template you have access to the columns of the table by using the {{ .column_name }} syntax. Note that all values are strings. For example, you can use for assembling the email address by accessing to first_name and last_name columns {{ .first_name | lower }}.{{ .last_name | lower }}.
The transformer always generates random sequences for the email, and you can use it by accessing the {{ .random_string }} variable. For example, we can add random string in the end of local part {{ .first_name | lower }}.{{ .last_name | lower }}.{{ .random_string }}.
Read more about template function Template functions.
"},{"location":"built_in_transformers/standard_transformers/random_email/#random-email-generation-using-first-name-and-last-name","title":"Random email generation using first name and last name","text":"
In this example, the RandomEmail transformer generates random email addresses for the email column in the account table. The transformer generates email addresses using the first_name and last_name columns as the local part of the email address and adds a random string to the end of the local part with length 10 characters. The original domain part of the email address is preserved.
CREATE TABLE account\n(\n id SERIAL PRIMARY KEY,\n gender VARCHAR(1) NOT NULL,\n email TEXT NOT NULL NOT NULL UNIQUE,\n first_name TEXT NOT NULL,\n last_name TEXT NOT NULL,\n birth_date DATE,\n created_at TIMESTAMP NOT NULL DEFAULT NOW()\n);\n\nINSERT INTO account (first_name, gender, last_name, birth_date, email)\nVALUES ('John', 'M', 'Smith', '1980-01-01', 'john.smith@gmail.com');\n
ColumnOriginalValueTransformedValue emailjohn.smith@gmail.comjohn.smith.a075d99e2d@gmail.com"},{"location":"built_in_transformers/standard_transformers/random_email/#simple-random-email-generation","title":"Simple random email generation","text":"
In this example, the RandomEmail transformer generates random email addresses for the email column in the account table. The transformer generates random email addresses with a maximum length of 10 characters.
Generate a random float within the provided interval.
"},{"location":"built_in_transformers/standard_transformers/random_float/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes float4, float8 min The minimum threshold for the random value. The value range depends on the column type. Yes - max The maximum threshold for the random value. The value range depends on the column type. Yes - decimal The decimal of the random float value (number of digits after the decimal point) 4 No - keep_null Indicates whether NULL values should be replaced with transformed values or not true No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_float/#dynamic-parameters","title":"Dynamic parameters","text":"Parameter Supported types min float4, float8 max float4, float8"},{"location":"built_in_transformers/standard_transformers/random_float/#description","title":"Description","text":"
The RandomFloat transformer generates a random float value within the provided interval, starting from min to max, with the option to specify the number of decimal digits by using the decimal parameter. The behaviour for NULL values can be configured using the keep_null parameter.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/random_float/#example-generate-random-price","title":"Example: Generate random price","text":"
In this example, the RandomFloat transformer generates random prices in the range from 0.1 to 7000 while maintaining a decimal of up to 2 digits.
Generate a random integer within the provided interval.
"},{"location":"built_in_transformers/standard_transformers/random_int/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes int2, int4, int8 min The minimum threshold for the random value Yes - max The maximum threshold for the random value Yes - keep_null Indicates whether NULL values should be replaced with transformed values or not true No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_int/#dynamic-parameters","title":"Dynamic parameters","text":"Parameter Supported types min int2, int4, int8 max int2, int4, int8"},{"location":"built_in_transformers/standard_transformers/random_int/#description","title":"Description","text":"
The RandomInt transformer generates a random integer within the specified min and max thresholds. The behaviour for NULL values can be configured using the keep_null parameter.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/random_int/#example-generate-random-item-quantity","title":"Example: Generate random item quantity","text":"
In the following example, the RandomInt transformer generates a random value in the range from 1 to 30 and assigns it to the orderqty column.
generate random orderqty in the range from 1 to 30
ColumnOriginalValueTransformedValue orderqty129"},{"location":"built_in_transformers/standard_transformers/random_int/#example-generate-random-sick-leave-hours-based-on-vacation-hours","title":"Example: Generate random sick leave hours based on vacation hours","text":"
In the following example, the RandomInt transformer generates a random value in the range from 1 to the value of the vacationhours column and assigns it to the sickleavehours column. This configuration allows for the simulation of sick leave hours based on the number of vacation hours.
The RandomIp transformer is designed to populate specified database columns with random IP v4 or V6 addresses. This utility is essential for applications requiring the simulation of network data, testing systems that utilize IP addresses, or anonymizing real IP addresses in datasets.
"},{"location":"built_in_transformers/standard_transformers/random_ip/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar, inet subnet Subnet for generating random ip in V4 or V6 format Yes - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_ip/#dynamic-parameters","title":"Dynamic parameters","text":"Name Supported types subnet cidr, text, varchar"},{"location":"built_in_transformers/standard_transformers/random_ip/#description","title":"Description","text":"
Utilizing a robust algorithm or library for generating IP addresses, the RandomIp transformer injects random IPv4 or IPv6 addresses into the designated database column, depending on the provided subnet. The transformer automatically detects whether to generate an IPv4 or IPv6 address based on the subnet version specified.
"},{"location":"built_in_transformers/standard_transformers/random_ip/#example-generate-a-random-ipv4-address-for-a-1921681024-subnet","title":"Example: Generate a Random IPv4 Address for a 192.168.1.0/24 Subnet","text":"
This example demonstrates how to configure the RandomIp transformer to inject a random IPv4 address into the ip_address column for entries in the 192.168.1.0/24 subnet:
Create table ip_networks and insert data
CREATE TABLE ip_networks\n(\n id SERIAL PRIMARY KEY,\n ip_address INET,\n network CIDR\n);\n\nINSERT INTO ip_networks (ip_address, network)\nVALUES ('192.168.1.10', '192.168.1.0/24'),\n ('10.0.0.5', '10.0.0.0/16'),\n ('172.16.254.3', '172.16.0.0/12'),\n ('192.168.100.14', '192.168.100.0/24'),\n ('2001:0db8:85a3:0000:0000:8a2e:0370:7334', '2001:0db8:85a3::/64'); -- An IPv6 address and network\n
ColumnOriginalValueTransformedValue ip_address192.168.1.10192.168.1.28"},{"location":"built_in_transformers/standard_transformers/random_ip/#example-generate-a-random-ip-based-on-the-dynamic-subnet-parameter","title":"Example: Generate a Random IP Based on the Dynamic Subnet Parameter","text":"
This configuration illustrates how to use the RandomIp transformer dynamically, where it reads the subnet information from the network column of the database and generates a corresponding random IP address:
RandomPerson transformer example with dynamic mode
The RandomLatitude transformer generates random latitude values for specified database columns. It is designed to support geographical data enhancements, particularly useful for applications requiring randomized but plausible geographical coordinates.
"},{"location":"built_in_transformers/standard_transformers/random_latitude/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes float4, float8, numeric keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_latitude/#description","title":"Description","text":"
The RandomLatitude transformer utilizes the faker library to produce random latitude values within the range of -90 to +90 degrees. This transformer can be applied to columns designated to store geographical latitude information, enhancing data sets with randomized latitude coordinates.
"},{"location":"built_in_transformers/standard_transformers/random_latitude/#example-populate-random-latitude-for-the-locations-table","title":"Example: Populate random latitude for the locations table","text":"
This example demonstrates configuring the RandomLatitude transformer to populate the latitude column in the locations table with random latitude values.
With this configuration, the latitude column will be filled with random latitude values, replacing any existing non-NULL values. If keep_null is set to true, existing NULL values will be preserved.
The RandomLongitude transformer is designed to generate random longitude values for specified database columns, enhancing datasets with realistic geographic coordinates suitable for a wide range of applications, from testing location-based services to anonymizing real geographic data.
"},{"location":"built_in_transformers/standard_transformers/random_longitude/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes float4, float8, numeric keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_longitude/#description","title":"Description","text":"
The RandomLongitude transformer leverages the faker library to produce random longitude values within the globally accepted range of -180 to +180 degrees. This flexibility allows the transformer to be applied to any column intended for storing longitude data, providing a simple yet powerful tool for introducing randomized longitude coordinates into a database.
"},{"location":"built_in_transformers/standard_transformers/random_longitude/#example-populate-random-longitude-for-the-locations-table","title":"Example: Populate random longitude for the locations table","text":"
This example shows how to use the RandomLongitude transformer to fill the longitude column in the locations table with random longitude values.
This setup ensures that all entries in the longitude column receive a random longitude value, replacing any existing non-NULL values. If keep_null is set to true, then existing NULL values in the column will remain unchanged.
The RandomMac transformer is designed to populate specified database columns with random MAC addresses.
"},{"location":"built_in_transformers/standard_transformers/random_mac/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar, macaddr keep_original_vendor Should the Individual/Group (I/G) and Universal/Local (U/L) bits be preserved from the original MAC address. false No - cast_type Param which allow to set Individual/Group (I/G) bit in MAC Address. Allowed values [any, individual, group]. If this value is individual, the address is meant for a single device (unicast). If it is group, the address is for a group of devices, which can include multicast and broadcast addresses. any No management_type Param which allow to set Universal/Local (U/L) bit in MAC Address. Allowed values [any, universal, local]. If this bit is universal, the address is universally administered (globally unique). If it is local, the address is locally administered (such as when set manually or programmatically on a network device). any No engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_mac/#description","title":"Description","text":"
The RandomMac transformer generates a random MAC address and injects it into the specified database column. The transformer can be configured to preserve the Individual/Group (I/G) and Universal/Local (U/L) bits from the original MAC address. You can also keep the original vendor bits in the generated MAC address by setting the keep_original_vendor parameter to true.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/random_mac/#example-generate-a-random-mac-address","title":"Example: Generate a Random MAC Address","text":"
This example demonstrates how to configure the RandomMac transformer to inject a random MAC address into the mac_address column:
Create table mac_addresses and insert data
CREATE TABLE mac_addresses\n(\n id SERIAL PRIMARY KEY,\n device_name VARCHAR(50),\n mac_address MACADDR,\n description TEXT\n);\n\nINSERT INTO mac_addresses (device_name, mac_address, description)\nVALUES ('Device A', '00:1A:2B:3C:4D:5E', 'Description for Device A'),\n ('Device B', '01:2B:3C:4D:5E:6F', 'Description for Device B'),\n ('Device C', '02:3C:4D:5E:6F:70', 'Description for Device C'),\n ('Device D', '03:4D:5E:6F:70:71', 'Description for Device D'),\n ('Device E', '04:5E:6F:70:71:72', 'Description for Device E');\n
The RandomMonthName transformer is crafted to populate specified database columns with random month names. This transformer is especially useful for scenarios requiring the simulation of time-related data, such as user birth months or event months, without relying on specific date values.
"},{"location":"built_in_transformers/standard_transformers/random_month_name/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_month_name/#description","title":"Description","text":"
The RandomMonthName transformer utilizes the faker library to generate the names of months at random. It can be applied to any textual column in a database to introduce variety and realism into data sets that require representations of months without the need for specific calendar dates.
"},{"location":"built_in_transformers/standard_transformers/random_month_name/#example-populate-random-month-names-for-the-user_profiles-table","title":"Example: Populate random month names for the user_profiles table","text":"
This example demonstrates how to configure the RandomMonthName transformer to fill the birth_month column in the user_profiles table with random month names, adding a layer of diversity to user data without using actual birthdates.
With this setup, the birth_month column will be updated with random month names, replacing any existing non-NULL values. If the keep_null parameter is set to true, then existing NULL values within the column will remain untouched.
Generate a random numeric within the provided interval.
"},{"location":"built_in_transformers/standard_transformers/random_numeric/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes numeric, decimal min The minimum threshold for the random value. The value range depends on the column type. Yes - max The maximum threshold for the random value. The value range depends on the column type. Yes - decimal The decimal of the random numeric value (number of digits after the decimal point) 4 No - keep_null Indicates whether NULL values should be replaced with transformed values or not true No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_numeric/#dynamic-parameters","title":"Dynamic parameters","text":"Parameter Supported types min int2, int4, int8, float4, float8, numeric, decimal max int2, int4, int8, float4, float8, numeric, decimal"},{"location":"built_in_transformers/standard_transformers/random_numeric/#description","title":"Description","text":"
The RandomNumeric transformer generates a random numeric value within the provided interval, starting from min to max, with the option to specify the number of decimal digits by using the decimal parameter. The behaviour for NULL values can be configured using the keep_null parameter.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/random_numeric/#example-generate-random-price","title":"Example: Generate random price","text":"
In this example, the RandomNumeric transformer generates random prices in the range from 0.1 to 7000 while maintaining a decimal of up to 2 digits.
The RandomParagraph transformer is crafted to populate specified database columns with random paragraphs. This utility is indispensable for applications that require the generation of extensive textual content, such as simulating articles, enhancing textual datasets for NLP systems, or anonymizing textual content in databases.
"},{"location":"built_in_transformers/standard_transformers/random_paragraph/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_paragraph/#description","title":"Description","text":"
Employing sophisticated text generation algorithms or libraries, the RandomParagraph transformer generates random paragraphs, injecting them into the designated database column. This transformer is designed to create varied and plausible paragraphs that simulate real-world textual content, providing a valuable tool for database enrichment, testing, and anonymization.
"},{"location":"built_in_transformers/standard_transformers/random_paragraph/#example-populate-random-paragraphs-for-the-articles-table","title":"Example: Populate random paragraphs for the articles table","text":"
This example illustrates configuring the RandomParagraph transformer to populate the body column in an articles table with random paragraphs. It is an effective way to simulate diverse article content for development, testing, or demonstration purposes.
With this setup, the body column will receive random paragraphs for each entry, replacing any existing non-NULL values. Setting the keep_null parameter to true allows for the preservation of existing NULL values within the column, maintaining the integrity of records where article content is not applicable or provided.
The RandomPassword transformer is designed to populate specified database columns with random passwords. This utility is vital for applications that require the simulation of secure user data, testing systems with authentication mechanisms, or anonymizing real passwords in datasets.
"},{"location":"built_in_transformers/standard_transformers/random_password/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_password/#description","title":"Description","text":"
Employing sophisticated password generation algorithms or libraries, the RandomPassword transformer injects random passwords into the designated database column. This feature is particularly useful for creating realistic and secure user password datasets for development, testing, or demonstration purposes.
"},{"location":"built_in_transformers/standard_transformers/random_password/#example-populate-random-passwords-for-the-user_accounts-table","title":"Example: Populate random passwords for the user_accounts table","text":"
This example demonstrates how to configure the RandomPassword transformer to populate the password column in the user_accounts table with random passwords.
In this configuration, every entry in the password column will be updated with a random password. Setting the keep_null parameter to true will preserve existing NULL values in the column, accommodating scenarios where password data may not be applicable.
The RandomPerson transformer is designed to populate specified database columns with personal attributes such as first name, last name, title and gender.
"},{"location":"built_in_transformers/standard_transformers/random_person/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types columns The name of the column to be affected Yes text, varchar gender set specific gender (possible values: Male, Female, Any) Any No - gender_mapping Specify gender name to possible values when using dynamic mode in \"gender\" parameter Any No - fallback_gender Specify fallback gender if not mapped when using dynamic mode in \"gender\" parameter Any No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_person/#description","title":"Description","text":"
The RandomPerson transformer utilizes a comprehensive list of first names to inject random first names into the designated database column. This feature allows for the creation of diverse and realistic user profiles by simulating a variety of first names without using real user data.
name \u2014 the name of the column where the personal attributes will be stored. This value is required.
template - the template for the column value. You can use the next attributes: .FirstName, .LastName or .Title. For example, if you want to generate a full name, you can use the next template: \"{{ .FirstName }} {{ .LastName }}\"
hashing - the bool value. Indicates whether the column value must be passed through the hashing function. The default value is false. If all column has hashing set to false (by default), then all columns will be hashed.
keep_null - the bool value. Indicates whether NULL values should be preserved. The default value is true
Gender that will be used if gender_mapping was not found. This parameter is optional and required only for gender parameter in dynamic mode. The default value is Any.
"},{"location":"built_in_transformers/standard_transformers/random_person/#example-populate-random-first-name-and-last-name-for-table-user_profiles-in-static-mode","title":"Example: Populate random first name and last name for table user_profiles in static mode","text":"
This example demonstrates how to use the RandomPerson transformer to populate the name and surname columns in the user_profiles table with random first names, last name, respectively.
Create table user_profiles and insert data
CREATE TABLE personal_data\n(\n id SERIAL PRIMARY KEY,\n name VARCHAR(100),\n surname VARCHAR(100),\n sex CHAR(1) CHECK (sex IN ('M', 'F'))\n);\n\n-- Insert sample data into the table\nINSERT INTO personal_data (name, surname, sex)\nVALUES ('John', 'Doe', 'M'),\n ('Jane', 'Smith', 'F'),\n ('Alice', 'Johnson', 'F'),\n ('Bob', 'Lee', 'M');\n
ColumnOriginalValueTransformedValue nameJohnZane surnameDoeMcCullough"},{"location":"built_in_transformers/standard_transformers/random_person/#example-populate-random-first-name-and-last-name-for-table-user_profiles-in-dynamic-mode","title":"Example: Populate random first name and last name for table user_profiles in dynamic mode","text":"
This example demonstrates how to use the RandomPerson transformer to populate the name, surname using dynamic gender
RandomPerson transformer example with dynamic mode
The RandomPhoneNumber transformer is developed to populate specified database columns with random phone numbers. This tool is essential for applications requiring the simulation of contact information, testing phone number validation systems, or anonymizing phone number data in datasets.
"},{"location":"built_in_transformers/standard_transformers/random_phone_number/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_phone_number/#description","title":"Description","text":"
The RandomPhoneNumber transformer utilizes algorithms capable of generating random phone numbers with various formats and injects them into the designated database column. This feature allows for the creation of diverse and realistic contact information in datasets for development, testing, or data anonymization purposes.
"},{"location":"built_in_transformers/standard_transformers/random_phone_number/#example-populate-random-phone-numbers-for-the-contact_information-table","title":"Example: Populate random phone numbers for the contact_information table","text":"
This example demonstrates configuring the RandomPhoneNumber transformer to populate the phone_number column in the contact_information table with random phone numbers. It is an effective method for simulating a variety of contact information entries.
In this configuration, the phone_number column will be updated with random phone numbers for each contact information entry, replacing any existing non-NULL values. If the keep_null parameter is set to true, existing NULL values in the column will be preserved, ensuring the integrity of records where phone number information is not applicable or provided.
The RandomSentence transformer is designed to populate specified database columns with random sentences. Ideal for simulating natural language text for user comments, testing NLP systems, or anonymizing textual data in databases.
"},{"location":"built_in_transformers/standard_transformers/random_sentence/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_sentence/#description","title":"Description","text":"
The RandomSentence transformer employs complex text generation algorithms or libraries to generate random sentences, injecting them into a designated database column without the need for specifying sentence length. This flexibility ensures the creation of varied and plausible text for a wide range of applications.
"},{"location":"built_in_transformers/standard_transformers/random_sentence/#example-populate-random-sentences-for-the-comments-table","title":"Example: Populate random sentences for the comments table","text":"
This example shows how to configure the RandomSentence transformer to populate the comment column in the comments table with random sentences. It is a straightforward method for simulating diverse user-generated content.
In this configuration, the comment column will be updated with random sentences for each entry, replacing any existing non-NULL values. If keep_null is set to true, existing NULL values in the column will be preserved, maintaining the integrity of records where comments are not applicable.
Generate a random string using the provided characters within the specified length range.
"},{"location":"built_in_transformers/standard_transformers/random_string/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar min_length The minimum length of the generated string Yes - max_length The maximum length of the generated string Yes - symbols The range of characters that can be used in the random string abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ No - keep_null Indicates whether NULL values should be replaced with transformed values or not true No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_string/#description","title":"Description","text":"
The RandomString transformer generates a random string with a length between min_length and max_length using the characters specified in the symbols string as the possible set of characters. The behaviour for NULL values can be configured using the keep_null parameter.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/random_string/#example-generate-a-random-string-for-accountnumber","title":"Example: Generate a random string for accountnumber","text":"
In the following example, a random string is generated for the accountnumber column with a length range from 9 to 12. The character set used for generation includes 1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ.
The RandomTimezone transformer is designed to populate specified database columns with random timezone strings. This transformer is particularly useful for applications that require the simulation of global user data, testing of timezone-related functionalities, or anonymizing real user timezone information in datasets.
"},{"location":"built_in_transformers/standard_transformers/random_timezone/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_timezone/#description","title":"Description","text":"
Utilizing a comprehensive library or algorithm for generating timezone data, the RandomTimezone transformer provides random timezone strings (e. g., \"America/New_York\", \"Europe/London\") for database columns. This feature enables the creation of diverse and realistic datasets by simulating timezone information for user profiles, event timings, or any other data requiring timezone context.
"},{"location":"built_in_transformers/standard_transformers/random_timezone/#example-populate-random-timezone-strings-for-the-user_accounts-table","title":"Example: Populate random timezone strings for the user_accounts table","text":"
This example demonstrates how to configure the RandomTimezone transformer to populate the timezone column in the user_accounts table with random timezone strings, enhancing the dataset with varied global user representations.
With this configuration, every entry in the timezone column will be updated with a random timezone string, replacing any existing non-NULL values. If the keep_null parameter is set to true, existing NULL values within the column will remain unchanged, preserving the integrity of rows without specified timezone data.
The RandomTollFreePhoneNumber transformer is designed to populate specified database columns with random toll-free phone numbers. This tool is essential for applications requiring the simulation of contact information, testing phone number validation systems, or anonymizing phone number data in datasets while focusing on toll-free numbers.
"},{"location":"built_in_transformers/standard_transformers/random_toll_free_phone_number/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_toll_free_phone_number/#description","title":"Description","text":"
The RandomTollFreePhoneNumber transformer utilizes algorithms capable of generating random toll-free phone numbers with various formats and injects them into the designated database column. This feature allows for the creation of diverse and realistic toll-free contact information in datasets for development, testing, or data anonymization purposes.
"},{"location":"built_in_transformers/standard_transformers/random_toll_free_phone_number/#example-populate-random-toll-free-phone-numbers-for-the-contact_information-table","title":"Example: Populate random toll-free phone numbers for the contact_information table","text":"
This example demonstrates configuring the RandomTollFreePhoneNumber transformer to populate the phone_number column in the contact_information table with random toll-free phone numbers. It is an effective method for simulating a variety of contact information entries with toll-free numbers.
In this configuration, the phone_number column will be updated with random toll-free phone numbers for each contact information entry, replacing any existing non-NULL values. If the keep_null parameter is set to true, existing NULL values in the column will be preserved, ensuring the integrity of records where toll-free phone number information is not applicable or provided.
The RandomUnixTimestamp transformer generates random Unix time values (timestamps) for specified database columns. It is particularly useful for populating columns with timestamp data, simulating time-related data, or anonymizing actual timestamps in a dataset.
"},{"location":"built_in_transformers/standard_transformers/random_unix_timestamp/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes int2, int4, int8 min The minimum threshold date for the random value in unix timestamp format (integer) with sec unit by default Yes - max The maximum threshold date for the random value in unix timestamp format (integer) with sec unit by default Yes - unit Generated unix timestamp value unit. Possible values [second, millisecond, microsecond, nanosecond] second Yes - min_unit Min unix timestamp threshold date unit. Possible values [second, millisecond, microsecond, nanosecond] second Yes - max_unit Min unix timestamp threshold date unit. Possible values [second, millisecond, microsecond, nanosecond] second Yes - keep_null Indicates whether NULL values should be preserved false No - truncate Truncate the date to the specified part (nanosecond, microsecond, millisecond, second, minute, hour, day, month, year). The truncate operation is not applied by default. No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_unix_timestamp/#description","title":"Description","text":"
The RandomUnixTimestamp transformer generates random Unix timestamps within the provided interval, starting from min to max. The min and max parameters are expected to be in Unix timestamp format. The min_unit and max_unit parameters specify the unit of the Unix timestamp threshold date. The truncate parameter allows you to truncate the date to the specified part of the date. The keep_null parameter allows you to specify whether NULL values should be preserved or replaced with transformed values.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/random_unix_timestamp/#example-generate-random-unix-timestamps-with-dynamic-parameters","title":"Example: Generate random Unix timestamps with dynamic parameters","text":"
In this example, the RandomUnixTimestamp transformer generates random Unix timestamps using dynamic parameters. The min parameter is set to the created_at column, which is converted to Unix seconds using the TimestampToUnixSec. The max parameter is set to a fixed value. The paid_at column is populated with random Unix timestamps in the range from created_at to 1715934239 (Unix timestamp for 2024-05-17 12:03:59). The unit parameter is set to millisecond because the paid_at column stores timestamps in milliseconds.
CREATE TABLE transactions\n(\n id SERIAL PRIMARY KEY,\n kind VARCHAR(255),\n total DECIMAL(10, 2),\n created_at TIMESTAMP,\n paid_at BIGINT -- stores milliseconds since the epoch\n);\n\n-- Inserting data with milliseconds timestamp\nINSERT INTO transactions (kind, total, created_at, paid_at)\nVALUES ('Sale', 199.99, '2023-05-17 12:00:00', (EXTRACT(EPOCH FROM TIMESTAMP '2023-05-17 12:05:00') * 1000)),\n ('Refund', 50.00, '2023-05-18 15:00:00', (EXTRACT(EPOCH FROM TIMESTAMP '2023-05-18 15:10:00') * 1000)),\n ('Sale', 129.99, '2023-05-19 10:30:00', (EXTRACT(EPOCH FROM TIMESTAMP '2023-05-19 10:35:00') * 1000));\n
ColumnOriginalValueTransformedValue paid_at16843251000001708919030732"},{"location":"built_in_transformers/standard_transformers/random_unix_timestamp/#example-generate-simple-random-unix-timestamps","title":"Example: Generate simple random Unix timestamps","text":"
In this example, the RandomUnixTimestamp transformer generates random Unix timestamps for the paid_at column in the range from 1615934239 (Unix timestamp for 2021-03-16 12:03:59) to 1715934239 (Unix timestamp for 2024-05-17 12:03:59). The unit parameter is set to millisecond because the paid_at column stores timestamps in milliseconds.
The RandomURL transformer is designed to populate specified database columns with random URL (Uniform Resource Locator) addresses. This tool is highly beneficial for simulating web content, testing applications that require URL input, or anonymizing real web addresses in datasets.
"},{"location":"built_in_transformers/standard_transformers/random_url/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_url/#description","title":"Description","text":"
Utilizing advanced algorithms or libraries for generating URL strings, the RandomURL transformer injects random, plausible URLs into the designated database column. Each generated URL is structured to include the protocol (e. g., \"http://\", \"https://\"), domain name, and path, offering a realistic range of web addresses for various applications.
"},{"location":"built_in_transformers/standard_transformers/random_url/#example-populate-random-urls-for-the-webpages-table","title":"Example: Populate random URLs for the webpages table","text":"
This example illustrates how to configure the RandomURL transformer to populate the page_url column in a webpages table with random URLs, providing a broad spectrum of web addresses for testing or data simulation purposes.
With this configuration, the page_url column will be filled with random URLs for each entry, replacing any existing non-NULL values. Setting the keep_null parameter to true allows for the preservation of existing NULL values within the column, accommodating scenarios where URL data may be intentionally omitted.
The RandomUsername transformer is crafted to populate specified database columns with random usernames. This utility is crucial for applications that require the simulation of user data, testing systems with user login functionality, or anonymizing real usernames in datasets.
"},{"location":"built_in_transformers/standard_transformers/random_username/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_username/#description","title":"Description","text":"
By employing sophisticated algorithms or libraries capable of generating believable usernames, the RandomUsername transformer introduces random usernames into the specified database column. Each generated username is designed to be unique and plausible, incorporating a mix of letters, numbers, and possibly special characters, depending on the generation logic used.
"},{"location":"built_in_transformers/standard_transformers/random_username/#example-populate-random-usernames-for-the-user_accounts-table","title":"Example: Populate random usernames for the user_accounts table","text":"
This example demonstrates configuring the RandomUsername transformer to populate the username column in a user_accounts table with random usernames. This setup is ideal for creating a diverse and realistic user base for development, testing, or demonstration purposes.
In this configuration, every entry in the username column will be updated with a random username, replacing any existing non-NULL values. If the keep_null parameter is set to true, then the transformer will preserve existing NULL values within the column, maintaining data integrity where usernames are not applicable or available.
"},{"location":"built_in_transformers/standard_transformers/random_uuid/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar, uuid keep_null Indicates whether NULL values should be replaced with transformed values or not true No - engine The engine used for generating the values [random, hash]. Use hash for deterministic generation random No -"},{"location":"built_in_transformers/standard_transformers/random_uuid/#description","title":"Description","text":"
The RandomUuid transformer generates a random UUID. The behaviour for NULL values can be configured using the keep_null parameter.
The engine parameter allows you to choose between random and hash engines for generating values. Read more about the engines in the Transformation engines section.
"},{"location":"built_in_transformers/standard_transformers/random_uuid/#example-updating-the-rowguid-column","title":"Example: Updating the rowguid column","text":"
The following example replaces original UUID values of the rowguid column to randomly generated ones.
The RandomWord transformer populates specified database columns with random words. Ideal for simulating textual content, enhancing linguistic datasets, or anonymizing text in databases.
"},{"location":"built_in_transformers/standard_transformers/random_word/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_word/#description","title":"Description","text":"
The RandomWord transformer employs a mechanism to inject random words into a designated database column, supporting the generation of linguistically plausible and contextually diverse text. This transformer is particularly beneficial for creating rich text datasets for development, testing, or educational purposes without specifying the language, focusing on versatility and ease of use.
"},{"location":"built_in_transformers/standard_transformers/random_word/#example-populate-random-words-for-the-content-table","title":"Example: Populate random words for the content table","text":"
This example demonstrates configuring the RandomWord transformer to populate the tag column in the content table with random words. It is a straightforward approach to adding varied textual data for tagging or content categorization.
In this setup, the tag column will be updated with random words for each entry, replacing any existing non-NULL values. If keep_null is set to true, existing NULL values in the column will remain unchanged, maintaining data integrity for records where textual data is not applicable.
The RandomYearString transformer is designed to populate specified database columns with random year strings. It is ideal for scenarios that require the representation of years without specific dates, such as manufacturing years of products, birth years of users, or any other context where only the year is relevant.
"},{"location":"built_in_transformers/standard_transformers/random_year_string/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar, int2, int4, int8, numeric keep_null Indicates whether NULL values should be preserved false No -"},{"location":"built_in_transformers/standard_transformers/random_year_string/#description","title":"Description","text":"
The RandomYearString transformer leverages the faker library to generate strings representing random years. This allows for the easy generation of year data in a string format, adding versatility and realism to datasets that need to simulate or anonymize year-related information.
"},{"location":"built_in_transformers/standard_transformers/random_year_string/#example-populate-random-year-strings-for-the-products-table","title":"Example: Populate random year strings for the products table","text":"
This example shows how to use the RandomYearString transformer to fill the manufacturing_year column in the products table with random year strings, simulating the diversity of manufacturing dates.
In this configuration, the manufacturing_year column will be populated with random year strings, replacing any existing non-NULL values. If keep_null is set to true, then existing NULL values in the column will be preserved.
Generates real addresses for specified database columns using the faker library. It supports customization of the generated address format through Go templates.
"},{"location":"built_in_transformers/standard_transformers/real_address/#parameters","title":"Parameters","text":"Name Properties Description Default Required Supported DB types columns Specifies the affected column names along with additional properties for each column Yes Various \u221f name The name of the column to be affected Yes string \u221f template A Go template string for formatting real address attributes Yes string \u221f keep_null Indicates whether NULL values should be preserved No bool"},{"location":"built_in_transformers/standard_transformers/real_address/#template-value-descriptions","title":"Template value descriptions","text":"
The template parameter allows for the injection of real address attributes into a customizable template. The following values can be included in your template:
{{.Address}} \u2014 street address or equivalent
{{.City}} \u2014 city name
{{.State}} \u2014 state, province, or equivalent region name
{{.PostalCode}} \u2014 postal or ZIP code
{{.Latitude}} \u2014 geographic latitude
{{.Longitude}} \u2014 geographic longitude
These placeholders can be combined and formatted as desired within the template string to generate custom address formats.
The RealAddress transformer uses the faker library to generate realistic addresses, which can then be formatted according to a specified template and applied to selected columns in a database. It allows for the generated addresses to replace existing values or to preserve NULL values, based on the transformer's configuration.
"},{"location":"built_in_transformers/standard_transformers/real_address/#example-generate-real-addresses-for-the-employee-table","title":"Example: Generate Real addresses for the employee table","text":"
This example shows how to configure the RealAddress transformer to generate real addresses for the address column in the employee table, using a custom format.
This configuration will generate real addresses with the format \"Street address, city, state postal code\" and apply them to the address column, replacing any existing non-NULL values.
"},{"location":"built_in_transformers/standard_transformers/regexp_replace/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes text, varchar regexp The regular expression pattern to search for in the column's value Yes - replace The replacement value. This value may be replaced with a captured group from the regexp parameter. Yes -"},{"location":"built_in_transformers/standard_transformers/regexp_replace/#description","title":"Description","text":"
The RegexpReplace transformer replaces a string according to the applied regular expression. The valid regular expressions syntax is the same as the general syntax used by Perl, Python, and other languages. To be precise, it is the syntax accepted by RE2 and described in the Golang documentation, except for \\C.
"},{"location":"built_in_transformers/standard_transformers/regexp_replace/#example-removing-leading-prefix-from-loginid-column-value","title":"Example: Removing leading prefix from loginid column value","text":"
In the following example, the original values from loginid matching the adventure-works\\{{ id_name }} format are replaced with {{ id_name }}.
| column name | original value | transformed |\n|-------------|----------------------|-------------|\n| loginid | adventure-works\\ken0 | ken0 |\n
Note
YAML has control symbols, and using them without escaping may result in an error. In the example above, the prefix of id is separated by the \\ symbol. Since this symbol is a control symbol, we must escape it using \\\\. However, the '\\' symbol is also a control symbol for regular expressions, which is why we need to double-escape it as \\\\\\\\.
"},{"location":"built_in_transformers/standard_transformers/replace/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes any replace The value to replace Yes - keep_null Indicates whether NULL values should be replaced with transformed values or not true No - validate Performs a decoding procedure via the PostgreSQL driver using the column type to ensure that values have correct type true No -"},{"location":"built_in_transformers/standard_transformers/replace/#description","title":"Description","text":"
The Replace transformer replace an original value from the specified column with the provided one. It can optionally run a validation check with the validate parameter to ensure that the values are of a correct type before starting transformation. The behaviour for NULL values can be configured using the keep_null parameter.
"},{"location":"built_in_transformers/standard_transformers/replace/#example-updating-the-jobtitle-column","title":"Example: Updating the jobtitle column","text":"
In the following example, the provided value: \"programmer\" is first validated through driver decoding. If the current value of the jobtitle column is not NULL, it will be replaced with programmer. If the current value is NULL, it will remain NULL.
"},{"location":"built_in_transformers/standard_transformers/set_null/#parameters","title":"Parameters","text":"Name Description Default Required Supported DB types column The name of the column to be affected Yes any"},{"location":"built_in_transformers/standard_transformers/set_null/#description","title":"Description","text":"
The SetNull transformer assigns NULL value to a column. This transformer generates warning if the affected column has NOT NULL constraint.
NULL constraint violation warning
{\n \"hash\": \"5a229ee964a4ba674a41a4d63dab5a8c\",\n \"meta\": {\n \"ColumnName\": \"jobtitle\",\n \"ConstraintType\": \"NotNull\",\n \"ParameterName\": \"column\",\n \"SchemaName\": \"humanresources\",\n \"TableName\": \"employee\",\n \"TransformerName\": \"SetNull\"\n },\n \"msg\": \"transformer may produce NULL values but column has NOT NULL constraint\",\n \"severity\": \"warning\"\n}\n
"},{"location":"built_in_transformers/standard_transformers/set_null/#example-set-null-value-to-updated_at-column","title":"Example: Set NULL value to updated_at column","text":"SetNull transformer example
| column name | original value | transformed |\n|-------------|-------------------------|-------------|\n| jobtitle | Chief Executive Officer | NULL |\n
"},{"location":"commands/","title":"Commands","text":""},{"location":"commands/#introduction","title":"Introduction","text":"Greenmask available commands
You can use the following commands within Greenmask:
list-transformers \u2014 displays a list of available transformers along with their documentation
show-transformer \u2014 displays information about the specified transformer
validate - performs a validation procedure by testing config, comparing transformed data, identifying potential issues, and checking for schema changes.
dump \u2014 initiates the data dumping process
restore \u2014 restores data to the target database either by specifying a dumpId or using the latest available dump
list-dumps \u2014 lists all available dumps stored in the system
show-dump \u2014 provides metadata information about a particular dump, offering insights into its structure and attributes
delete \u2014 deletes a specific dump from the storage
For any of the commands mentioned above, you can include the following common flags:
--log-format \u2014 specifies the desired format for log output, which can be either json or text. This parameter is optional, with the default format set to text.
--log-level \u2014 sets the desired level for log output, which can be one of debug, info, or error. This parameter is optional, with the default log level being info.
--config \u2014 requires the specification of a configuration file in YAML format. This configuration file is mandatory for Greenmask to operate correctly.
--help \u2014 displays comprehensive help information for Greenmask, providing guidance on its usage and available commands.
Usage:\n greenmask delete [flags] [dumpId]\n\nFlags:\n --before-date string delete dumps older than the specified date in RFC3339Nano format: 2021-01-01T00:00.0:00Z\n --dry-run do not delete anything, just show what would be deleted\n --prune-failed prune failed dumps\n --prune-unsafe prune dumps with \"unknown-or-failed\" statuses. Works only with --prune-failed\n --retain-for string retain dumps for the specified duration in format: 1w2d3h4m5s6ms7us8ns\n --retain-recent int retain the most recent N completed dumps (default -1)\n
Stores the transformed data in the specified storage location.
Note that the dump command shares the same parameters and environment variables as pg_dump, allowing you to configure the restoration process as needed.
Mostly it supports the same flags as the pg_dump utility, with some extra flags for Greenmask-specific features.
Supported flags
-b, --blobs include large objects in dump\n -c, --clean clean (drop) database objects before recreating\n -Z, --compress int compression level for compressed formats (default -1)\n -C, --create include commands to create database in dump\n -a, --data-only dump only the data, not the schema\n -d, --dbname string database to dump (default \"postgres\")\n --disable-dollar-quoting disable dollar quoting, use SQL standard quoting\n --enable-row-security enable row security (dump only content user has access to)\n -E, --encoding string dump the data in encoding ENCODING\n -N, --exclude-schema strings dump the specified schema(s) only\n -T, --exclude-table strings do NOT dump the specified table(s)\n --exclude-table-data strings do NOT dump data for the specified table(s)\n -e, --extension strings dump the specified extension(s) only\n --extra-float-digits string override default setting for extra_float_digits\n -f, --file string output file or directory name\n -h, --host string database server host or socket directory (default \"/var/run/postgres\")\n --if-exists use IF EXISTS when dropping objects\n --include-foreign-data strings use IF EXISTS when dropping objects\n -j, --jobs int use this many parallel jobs to dump (default 1)\n --load-via-partition-root load partitions via the root table\n --lock-wait-timeout int fail after waiting TIMEOUT for a table lock (default -1)\n -B, --no-blobs exclude large objects in dump\n --no-comments do not dump comments\n -O, --no-owner skip restoration of object ownership in plain-text format\n -X, --no-privileges do not dump privileges (grant/revoke)\n --no-publications do not dump publications\n --no-security-labels do not dump security label assignments\n --no-subscriptions do not dump subscriptions\n --no-sync do not wait for changes to be written safely to dis\n --no-synchronized-snapshots do not use synchronized snapshots in parallel jobs\n --no-tablespaces do not dump tablespace assignments\n --no-toast-compression do not dump TOAST compression methods\n --no-unlogged-table-data do not dump unlogged table data\n --pgzip use pgzip compression instead of gzip\n -p, --port int database server port number (default 5432)\n --quote-all-identifiers quote all identifiers, even if not key words\n -n, --schema strings dump the specified schema(s) only\n -s, --schema-only dump only the schema, no data\n --section string dump named section (pre-data, data, or post-data)\n --serializable-deferrable wait until the dump can run without anomalies\n --snapshot string use given snapshot for the dump\n --strict-names require table and/or schema include patterns to match at least one entity each\n -t, --table strings dump the specified table(s) only\n --test string connect as specified database user (default \"postgres\")\n --use-set-session-authorization use SET SESSION AUTHORIZATION commands instead of ALTER OWNER commands to set ownership\n -U, --username string connect as specified database user (default \"postgres\")\n -v, --verbose string verbose mode\n
By default, Greenmask uses gzip compression to restore data. In mist cases it is quite slow and does not utilize all available resources and is a bootleneck for IO operations. To speed up the restoration process, you can use the --pgzip flag to use pgzip compression instead of gzip. This method splits the data into blocks, which are compressed in parallel, making it ideal for handling large volumes of data. The output remains a standard gzip file.
The list-dumps command provides a list of all dumps stored in the storage. The list includes the following attributes:
ID \u2014 the unique identifier of the dump, used for operations like restore, delete, and show-dump
DATE \u2014 the date when the snapshot was created
DATABASE \u2014 the name of the database associated with the dump
SIZE \u2014 the original size of the dump
COMPRESSED SIZE \u2014 the size of the dump after compression
DURATION \u2014 the duration of the dump procedure
TRANSFORMED \u2014 indicates whether the dump has been transformed
STATUS \u2014 the status of the dump, which can be one of the following:
done \u2014 the dump was completed successfully
in progress \u2014 the dump is currently being created
failed \u2014 the dump creation process failed
unknown or failed \u2014 the deprecated status of the dump that is used for failed dumps or dumps in progress for version v0.1.14 and earlier
Example of list-dumps output:
Info
Greenmask uses a heartbeat mechanism to determine the status of a dump. A dump is considered failed if it lacks a \"done\" heartbeat or if the last heartbeat timestamp exceeds 30 minutes. Heartbeats are recorded every 15 minutes by the dump command while it is in progress. If greenmask fails unexpectedly, the heartbeat stops being updated, and after 30 minutes (twice the interval), the dump is classified as failed. The in progress status indicates that a dump is still ongoing.
The list-transformers command provides a list of all the allowed transformers, including both standard and advanced transformers. This list can be helpful for searching for an appropriate transformer for your data transformation needs.
To show a list of available transformers, use the following command:
greenmask --config=config.yml list-transformers\n
Supported flags:
--format \u2014 allows to select the output format. There are two options available: text or json. The default setting is text.
Example of list-transformers output:
When using the list-transformers command, you receive a list of available transformers with essential information about each of them. Below are the key parameters for each transformer:
NAME \u2014 the name of the transformer
DESCRIPTION \u2014 a brief description of what the transformer does
COLUMN PARAMETER NAME \u2014 name of a column or columns affected by transformation
SUPPORTED TYPES \u2014 list the supported value types
The JSON call greenmask --config=config.yml list-transformers --format=json has the same attributes:
JSON format output
[\n {\n \"name\": \"Cmd\",\n \"description\": \"Transform data via external program using stdin and stdout interaction\",\n \"parameters\": [\n {\n \"name\": \"columns\",\n \"supported_types\": [\n \"any\"\n ]\n }\n ]\n },\n {\n \"name\": \"Dict\",\n \"description\": \"Replace values matched by dictionary keys\",\n \"parameters\": [\n {\n \"name\": \"column\",\n \"supported_types\": [\n \"any\"\n ]\n }\n ]\n }\n]\n
The restore command is used to restore a database from a previously created dump. You can specify the dump to restore by providing the dump ID or use the latest keyword to restore the latest completed dump.
greenmask --config=config.yml restore DUMP_ID\n
Alternatively, to restore the latest completed dump, use the following command:
greenmask --config=config.yml restore latest\n
Note that the restore command shares the same parameters and environment variables as pg_restore, allowing you to configure the restoration process as needed.
Mostly it supports the same flags as the pg_restore utility, with some extra flags for Greenmask-specific features.
Supported flags
--batch-size int the number of rows to insert in a single batch during the COPY command (0 - all rows will be inserted in a single batch)\n -c, --clean clean (drop) database objects before recreating\n -C, --create create the target database\n -a, --data-only restore only the data, no schema\n -d, --dbname string connect to database name (default \"postgres\")\n --disable-triggers disable triggers during data section restore\n --enable-row-security enable row security\n -N, --exclude-schema strings do not restore objects in this schema\n -e, --exit-on-error exit on error, default is to continue\n -f, --file string output file name (- for stdout)\n -P, --function strings restore named function\n -h, --host string database server host or socket directory (default \"/var/run/postgres\")\n --if-exists use IF EXISTS when dropping objects\n -i, --index strings restore named index\n --inserts restore data as INSERT commands, rather than COPY\n -j, --jobs int use this many parallel jobs to restore (default 1)\n --list-format string use table of contents in format of text, json or yaml (default \"text\")\n --no-comments do not restore comments\n --no-data-for-failed-tables do not restore data of tables that could not be created\n -O, --no-owner skip restoration of object ownership\n -X, --no-privileges skip restoration of access privileges (grant/revoke)\n --no-publications do not restore publications\n --no-security-labels do not restore security labels\n --no-subscriptions ddo not restore subscriptions\n --no-table-access-method do not restore table access methods\n --no-tablespaces do not restore tablespace assignments\n --on-conflict-do-nothing add ON CONFLICT DO NOTHING to INSERT commands\n --overriding-system-value use OVERRIDING SYSTEM VALUE clause for INSERTs\n --pgzip use pgzip decompression instead of gzip\n -p, --port int database server port number (default 5432)\n --restore-in-order restore tables in topological order, ensuring that dependent tables are not restored until the tables they depend on have been restored\n -n, --schema strings restore only objects in this schema\n -s, --schema-only restore only the schema, no data\n --section string restore named section (pre-data, data, or post-data)\n -1, --single-transaction restore as a single transaction\n --strict-names restore named section (pre-data, data, or post-data) match at least one entity each\n -S, --superuser string superuser user name to use for disabling triggers\n -t, --table strings restore named relation (table, view, etc.)\n -T, --trigger strings restore named trigger\n -L, --use-list string use table of contents from this file for selecting/ordering output\n --use-session-replication-role-replica use SET session_replication_role = 'replica' to disable triggers during data section restore (alternative for --disable-triggers)\n --use-set-session-authorization use SET SESSION AUTHORIZATION commands instead of ALTER OWNER commands to set ownership\n -U, --username string connect as specified database user (default \"postgres\")\n -v, --verbose string verbose mode\n
"},{"location":"commands/restore/#extra-features","title":"Extra features","text":""},{"location":"commands/restore/#inserts-and-error-handling","title":"Inserts and error handling","text":"
Warning
Insert commands are a lot slower than COPY commands. Use this feature only when necessary.
By default, Greenmask restores data using the COPY command. If you prefer to restore data using INSERT commands, you can use the --inserts flag. This flag allows you to manage errors that occur during the execution of INSERT commands. By configuring an error and constraint exclusion list in the config, you can skip certain errors and continue inserting subsequent rows from the dump.
This can be useful when adding new records to an existing dump, but you don't want the process to stop if some records already exist in the database or violate certain constraints.
By adding the --on-conflict-do-nothing flag, it generates INSERT statements with the ON CONFLICT DO NOTHING clause, similar to the original pg_dump option. However, this approach only works for unique or exclusion constraints. If a foreign key is missing in the referenced table or any other constraint is violated, the insertion will still fail. To handle these issues, you can define anexclusion list in the config.
example with inserts and error handling
```shell title=\"example with inserts and on conflict do nothing\"\ngreenmask --config=config.yml restore DUMP_ID --inserts --on-conflict-do-nothing\n
By adding the --overriding-system-value flag, it generates INSERT statements with the OVERRIDING SYSTEM VALUE clause, which allows you to insert data into identity columns.
example of GENERATED ALWAYS AS IDENTITY column
CREATE TABLE people (\n id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,\n generated text GENERATED ALWAYS AS (id || first_name) STORED,\n first_name text\n);\n
"},{"location":"commands/restore/#restoration-in-topological-order","title":"Restoration in topological order","text":"
By default, Greenmask restores tables in the order they are listed in the dump file. To restore tables in topological order, use the --restore-in-order flag. This flag ensures that dependent tables are not restored until the tables they depend on have been restored.
This is useful when you have the schema already created with foreign keys and other constraints, and you want to insert data into the tables in the correct order or catch-up the target database with the new data.
Warning
Greenmask cannot guarantee restoration in topological order when the schema contains cycles. The only way to restore tables with cyclic dependencies is to temporarily remove the foreign key constraint (to break the cycle), restore the data, and then re-add the foreign key constraint once the data restoration is complete.
If your database has cyclic dependencies you will be notified about it but the restoration will continue.
2024-08-16T21:39:50+03:00 WRN cycle between tables is detected: cannot guarantee the order of restoration within cycle cycle=[\"public.employees\",\"public.departments\",\"public.projects\",\"public.employees\"]\n
By default, Greenmask uses gzip decompression to restore data. In mist cases it is quite slow and does not utilize all available resources and is a bootleneck for IO operations. To speed up the restoration process, you can use the --pgzip flag to use pgzip decompression instead of gzip. This method splits the data into blocks, which are decompressed in parallel, making it ideal for handling large volumes of data.
"},{"location":"commands/restore/#restore-data-batching","title":"Restore data batching","text":"
The COPY command returns the error only on transaction commit. This means that if you have a large dump and an error occurs, you will have to wait until the end of the transaction to see the error message. To avoid this, you can use the --batch-size flag to specify the number of rows to insert in a single batch during the COPY command. If an error occurs during the batch insertion, the error message will be displayed immediately. The data will be committed only if all batches are inserted successfully.
This is useful when you want to be notified of errors as immediately as possible without waiting for the entire table to be restored.
Warning
The batch size should be chosen carefully. If the batch size is too small, the restoration process will be slow. If the batch size is too large, you may not be able to identify the error row.
In the example below, the batch size is set to 1000 rows. This means that 1000 rows will be inserted in a single batch, so you will be notified of any errors immediately after each batch is inserted.
This command provides details about all objects and data that can be restored, similar to the pg_restore -l command in PostgreSQL. It helps you inspect the contents of the dump before performing the actual restoration.
Parameters:
--format \u2014 format of printing. Can be text or json.
To display metadata information about a dump, use the following command:
The date when the backup has been initiated, also indicating the snapshot date.
The date when the backup process was successfully completed.
The original size of the backup in bytes.
The size of the backup after compression in bytes.
A list of tables that underwent transformation during the backup.
The schema name of the table.
The name of the table.
Custom query override, if applicable.
A list of transformers that were applied during the backup.
The name of the transformer.
The parameters provided for the transformer.
A mapping of overridden column types.
The header information in the table of contents file. This provides the same details as the --format=text output in the previous snippet.
The list of restoration entries. This offers the same information as the --format=text output in the previous snippet.
Note
The json format provides more detailed information compared to the text format. The text format is primarily used for backward compatibility and for generating a restoration list that can be used with pg_restore -L listfile. On the other hand, the json format provides comprehensive metadata about the dump, including information about the applied transformers and their parameters. The json format is especially useful for detailed dump introspection.
This command prints out detailed information about a transformer by a provided name, including specific attributes to help you understand and configure the transformer effectively.
To show detailed information about a transformer, use the following command:
--format \u2014 allows to select the output format. There are two options available: text or json. The default setting is text.
Example of show-transformer output:
When using the show-transformer command, you receive detailed information about the transformer and its parameters and their possible attributes. Below are the key parameters for each transformer:
Name \u2014 the name of the transformer
Description \u2014 a brief description of what the transformer does
Parameters \u2014 a list of transformer parameters, each with its own set of attributes. Possible attributes include:
description \u2014 a brief description of the parameter's purpose
required \u2014 a flag indicating whether the parameter is required when configuring the transformer
link_parameter \u2014 specifies whether the value of the parameter will be encoded using a specific parameter type encoder. For example, if a parameter named column is linked to another parameter start, the start parameter's value will be encoded according to the column type when the transformer is initialized.
cast_db_type \u2014 indicates that the value should be encoded according to the database type. For example, when dealing with the INTERVAL data type, you must provide the interval value in PostgreSQL format.
default_value \u2014 the default value assigned to the parameter if it's not provided during configuration.
column_properties \u2014 if a parameter represents the name of a column, it may contain additional properties, including:
nullable \u2014 indicates whether the transformer may produce NULL values, potentially violating the NOT NULL constraint
unique \u2014 specifies whether the transformer guarantees unique values for each call. If set to true, it means that the transformer cannot produce duplicate values, ensuring compliance with the UNIQUE constraint.
affected \u2014 indicates whether the column is affected during the transformation process. If not affected, the column's value might still be required for transforming another column.
allowed_types \u2014 a list of data types that are compatible with this parameter
skip_original_data \u2014 specifies whether the original value of the column, before transformation, is relevant for the transformation process
skip_on_null \u2014 indicates whether the transformer should skip the transformation when the input column value is NULL. If the column value is NULL, interaction with the transformer is unnecessary.
Warning
The default value in JSON format is base64 encoded. This might be changed in later version of Greenmask.
The validate command allows you to perform a validation procedure and compare transformed data.
Below is a list of all supported flags for the validate command:
Supported flags
Usage:\n greenmask validate [flags]\n\nFlags:\n --data Perform test dump for --rows-limit rows and print it pretty\n --diff Find difference between original and transformed data\n --format string Format of output. possible values [text|json] (default \"text\")\n --rows-limit uint Check tables dump only for specific tables (default 10)\n --schema Make a schema diff between previous dump and the current state\n --table strings Check tables dump only for specific tables\n --table-format string Format of table output (only for --format=text). Possible values [vertical|horizontal] (default \"vertical\")\n --transformed-only Print only transformed column and primary key\n --warnings Print warnings\n
Validate command can exit with non-zero code when:
Any error occurred
Validate was called with --warnings flag and there are warnings
Validate was called with --schema flag and there are schema differences
All of those cases may be used for CI/CD pipelines to stop the process when something went wrong. This is especially useful when --schema flag is used - this allows to avoid data leakage when schema changed.
You can use the --table flag multiple times to specify the tables you want to check. Tables can be written with or without schema names (e. g., public.table_name or table_name). If you specify multiple tables from different schemas, an error will be thrown.
2024-03-15T19:46:12+02:00 WRN ValidationWarning={\"hash\":\"aa808fb574a1359c6606e464833feceb\",\"meta\":{\"ColumnName\":\"birthdate\",\"ConstraintDef\":\"CHECK (birthdate \\u003e= '1930-01-01'::date AND birthdate \\u003c= (now() - '18 years'::interval))\",\"ConstraintName\":\"humanresources\",\"ConstraintSchema\":\"humanresources\",\"ConstraintType\":\"Check\",\"ParameterName\":\"column\",\"SchemaName\":\"humanresources\",\"TableName\":\"employee\",\"TransformerName\":\"NoiseDate\"},\"msg\":\"possible constraint violation: column has Check constraint\",\"severity\":\"warning\"}\n
The validation output will provide detailed information about potential constraint violations and schema issues. Each line contains nested JSON data under the ValidationWarning key, offering insights into the affected part of the configuration and potential constraint violations.
Table schema name specifies the schema name of the affected table.
Table name identifies the name of the table where the problem occurs.
Transformer name indicates the name of the transformer responsible for the transformation.
Name of affected parameter typically, this is the name of the column parameter that is relevant to the validation warning.
Validation warning description provides a detailed description of the validation warning and the reason behind it.
Severity of validation warning indicates the severity level of the validation warning and can be one of the following:
* error\n* warning\n* info\n* debug\n
Hash is a unique identifier of the validation warning. It is used to resolve the warning in the config file
Note
A validation warning with a severity level of \"error\" is considered critical and must be addressed before the dump operation can proceed. Failure to resolve such warnings will prevent the dump operation from being executed.
Schema diff changed output example
2024-03-15T19:46:12+02:00 WRN Database schema has been changed Hint=\"Check schema changes before making new dump\" PreviousDumpId=1710520855501\n2024-03-15T19:46:12+02:00 WRN Column renamed Event=ColumnRenamed Signature={\"CurrentColumnName\":\"id1\",\"PreviousColumnName\":\"id\",\"TableName\":\"test\",\"TableSchema\":\"public\"}\n2024-03-15T19:46:12+02:00 WRN Column type changed Event=ColumnTypeChanged Signature={\"ColumnName\":\"id\",\"CurrentColumnType\":\"bigint\",\"CurrentColumnTypeOid\":\"20\",\"PreviousColumnType\":\"integer\",\"PreviousColumnTypeOid\":\"23\",\"TableName\":\"test\",\"TableSchema\":\"public\"}\n2024-03-15T19:46:12+02:00 WRN Column created Event=ColumnCreated Signature={\"ColumnName\":\"name\",\"ColumnType\":\"text\",\"TableName\":\"test\",\"TableSchema\":\"public\"}\n2024-03-15T19:46:12+02:00 WRN Table created Event=TableCreated Signature={\"SchemaName\":\"public\",\"TableName\":\"test1\",\"TableOid\":\"20563\"}\n
Example of validation diff:
The validation diff is presented in a neatly formatted table. In this table:
Columns that are affected by the transformation are highlighted with a red background.
The pre-transformation values are displayed in green.
The post-transformation values are shown in red.
The result in --format=text can be displayed in either horizontal (--table-format=horizontal) or vertical (--table-format=vertical) format, making it easy to visualize and understand the differences between the original and transformed data.
The whole validate command may be run in json format including logging making easy to parse the structure.
We are excited to announce the release of Greenmask v0.1.0, marking the first production-ready version. This release addresses various bug fixes, introduces improvements, and includes documentation refactoring for enhanced clarity.
Added positional arguments for the list-transformers command, allowing specific transformer information retrieval (e.g., greenmask list-transformers RandomDate).
Added a version parameter --version that prints Greenmask version.
Added numeric parameters support for -Int and -Float transformers.
Improved verbosity in custom transformer interaction, accumulating stderr data and forwarding it in batches instead of writing it one by one.
Updated dependencies to newer versions.
Enhanced the stability of the JSON line interaction protocol by utilizing the stdlib JSON encoder/decoder.
Modified the method for sending table metadata to custom transformers; now, it is sent via stdin in the first line in JSON format instead of providing it via command arguments.
Refactored template functions naming.
Refactored NoiseDate transformer implementation for improved stability and predictability.
Changed the default value for the Dict transformer: fail_not_matched parameter: true.
Refactored the Hash transformer to provide a salt parameter and receive a base64 encoded salt. If salt is not provided, it generates one randomly.
Added validation for the truncate parameter of NoiseDate and RandomDate transformers that issues a warning if the provided value is invalid.
Increased verbosity of parameter validation warnings, now properly forwarding warnings to stdout.
We are excited to announce the beta release of Greenmask, a versatile and open-source utility for PostgreSQL logical backup dumping, anonymization, and restoration. Greenmask is perfect for routine backup and restoration tasks. It facilitates anonymization and data masking for staging environments and analytics.
This release introduces a range of features aimed at enhancing database management and security.
Transformer Description RandomLatitude Generates a random latitude value RandomLongitude Generates a random longitude value RandomUnixTime Generates a random Unix timestamp RandomMonthName Generates the name of a random month RandomYearString Generates a random year as a string RandomDayOfWeek Generates a random day of the week RandomDayOfMonth Generates a random day of the month RandomCentury Generates a random century RandomTimezone Generates a random timezone RandomEmail Generates a random email address RandomMacAddress Generates a random MAC address RandomDomainName Generates a random domain name RandomURL Generates a random URL RandomUsername Generates a random username RandomIPv4 Generates a random IPv4 address RandomIPv6 Generates a random IPv6 address RandomPassword Generates a random password RandomWord Generates a random word RandomSentence Generates a random sentence RandomParagraph Generates a random paragraph RandomCCType Generates a random credit card type RandomCCNumber Generates a random credit card number RandomCurrency Generates a random currency code RandomAmountWithCurrency Generates a random monetary amount with currency RandomTitleMale Generates a random title for males RandomTitleFemale Generates a random title for females RandomFirstName Generates a random first name RandomFirstNameMale Generates a random male first name RandomFirstNameFemale Generates a random female first name RandomLastName Generates a random last name RandomName Generates a full random name RandomPhoneNumber Generates a random phone number RandomTollFreePhoneNumber Generates a random toll-free phone number RandomE164PhoneNumber Generates a random phone number in E.164 format RealAddress Generates a real address"},{"location":"release_notes/greenmask_0_1_1/#assets","title":"Assets","text":"
To download the Greenmask binary compatible with your system, see the release's assets list.
The Hash transformer has been completely remastered and now has the function parameter to choose from several hash algorithm options and the max_length parameter to truncate the hash tail.
Split information about transformers between the list-transformers and new show-transformer CLI commands, which allows for more comprehensible and useful outputs for both commands
Added error severity for the Cmd parameter validator
Added restoration filtering by --table, --schema and --exclude-schema parameters
Validate parameters without parameters validates only configuration file
Added the --schema parameter, which allows to make a schema diff between the previous dump and the current. This is useful when you want to check if the schema has changed after the migration. By controlling it we can exclude data leakage after migration
Validate command divided by many stages that can be controlled using parameters
Added salt parameter that can be set via config or via GREENMASK_GLOBAL_SALT
Added sha3 functions support in different modes (sha3-224, sha3-256, sha3-384, sha3-512)
Refactored Cmd transformer logic
Json API: Now it allows to use of column names instead of column indexes in JSON format
Csv API: Now it can use the column order from config via column remapping
The validate command was rewritten almost from scratch.
New option --transformed-only - displays only columns that are transformed with primary key (if exists). This allows to reduce the output data and make it more readable
Implemented json format for output
Added the --table-format parameter which is responsible for the vertical and horizontal table orientation. This works only when --format=text
Added the --warnings parameter, if it is specified then not only fatal-warnings will be displayed, but also those with a lower severity
Fixed --use-list option - now it applies toc entries according to the order in list file
Fixed --use-list option behaviour together with --list-format option (json or text). Now it generates temporal list file in text format for providing it to the pg_restore call
Updated documentation according to the latest changes
Implemented --exit-on-error parameter for pg_restore run. But it does not play for \"data\" section restoration now. If any error is caused in data section greenmask exits with the error whether --exit-on-error was provided or not. This might be fixed later
Fixed dependent objects dropping when running with the restore command with the --clean parameter. Useful when restoring and overriding only required tables
Fixed show-dump command output in text mode
Disabled CGO. Fixes problem when downloaded binary from repo cannot run
Implemented tables scoring according to the table size and transformation costs. This correctly spread the tables dumping between the requested workers pool and reduces the execution time. Now greenmask introspects the table size, adds the transformation scoring using the formula score = tableSizeInBytes + (tableSizeInBytes * 0.03 * tableTransformationsCount), and uses the strategy \"Largest First\". The problem is described here
Introduced no_verify_ssl parameter for S3 storage
Adjusted Dockerfile
Changed entrypoint to greenmask binary
The greenmask container now runs under greenmask user and groups
Refactored storage config structure. Now it contains the type that is used for the storage type determination
Most of the attributes may be overridden with environment variables where the letters are capitalized and the dots are replaced with underscores. For instance, the setting storage.type might be represented with the environment variable STORAGE_TYPE
Parameter --config is not required anymore. This simplifies the greenmask utility user experience
Directory storage set as the default
Set the default temporary directory as /tmp
Added environment variable section to the configuration docs
This is one of the biggest releases since Greenmask was founded. We've been in close contact with our users, gathering feedback, and working hard to make Greenmask more flexible, reliable, and user-friendly.
This major release introduces exciting new features such as database subsetting, pgzip support, restoration in topological order, and refactored transformers, significantly enhancing Greenmask's flexibility to better meet business needs. It also includes several fixes and improvements.
This release is a major milestone that significantly expands Greenmask's functionality, transforming it into a simple, extensible, and reliable solution for database security, data anonymization, and everyday operations. Our goal is to create a core system that can serve as a foundation for comprehensive dynamic staging environments and robust data security.
PostgreSQL 17 support - revised ported library to support PostgreSQL 17
Database Subset - a new feature that allows you to define a subset of the database, allowing you to scale down the dump size (#110). This is robust for multipurpose and especially useful for testing and development environments. It supports:
References with NULL values - generate the LEFT JOIN query for the FK reference with NULL values to include them in the subset.
Supports virtual references (virtual foreign keys) - create a logical FK in Greenmask that will be used for subset dependencies graph. The virtual reference can be defined for a column or an expression, allowing you to get the value from JSON and similar.
Supports circular references - Greenmask will automatically resolve circular dependencies in the subset by generating a recursive query. The query is generated with integrity checks of the subset ensuring that the data gathered from circular dependencies is consistent.
Fully covered with documentation including troubleshooting and examples.
Supports FK and PK that have more than one column (or expression).
Multi-cycles resolution in one strong connected component (SCC) is supported - Greenmask will generate a recursive query for the SCC whether it is a single cycle or multiple cycles, making the subset system universal for any database schema.
Supports polymorphic relationships - You can define a virtual reference for a table with polymorphic references using polymorphic_exprs attribute and use greenmask to generate a subset for such tables.
pgzip support for faster compression and decompression \u2014 setting --pgzip can speed up the dump and restoration processes through parallel compression. In some tests, it shows up to 5x faster dump and restore operations.
Restoration in topological order - This flag ensures that dependent tables are not restored until the tables they depend on have been restored. This is useful when you want to be notified of errors as immediately as possible without waiting for the entire table to be restored.
Insert format restoration - For a flexible restoration process, Greenmask now supports data restoration in the INSERT format. It generates the insert statements based on COPY records from the dump. You do not need to re-dump your data to use this feature; it can be defined in the restore command. The list of new features related to the INSERT format:
Generate INSERT statements with the **ON CONFLICT DO NOTHING** clause if the flag --on-conflict-do-nothing is set.
Error exclusion list in the config to skip certain errors and continue inserting subsequent rows from the dump.
Use cases - incremental dump and restoration for logical data. For example, if you have a database, and you want to insert data periodically from another source, this can be used together with the database subset and transformations to catch up the target database.
Restore data batching (#173) - By default, the COPY protocol returns the error only on transaction commit. To override this behavior, use the --batch-size flag to specify the number of rows to insert in a single batch during the COPY command. This is useful when you want to control the transaction size and commit.
Introduced keep_null parameter for RandomPerson transformer.
Introduced dynamic parameters in the transformers
Most transformers now support dynamic parameters where applicable.
Dynamic parameters are strictly enforced. If you need to cast values to another type, Greenmask provides templates and predefined cast functions accessible via cast_to. These functions cover frequent operations such as UnixTimestampToDate and IntToBool.
The transformation logic has been significantly refactored, making transformers more customizable and flexible than before.
Introduced transformation engines
random - generates transformer values based on pseudo-random algorithms.
hash - generates transformer values using hash functions. Currently, it utilizes sha3 hash functions, which are secure but perform slowly. In the stable release, there will be an option to choose between sha3 and SipHash.
Introduced static parameters value template
Dumps retention management - Introduced retention parameters (#201) for the delete command. Introduced two new statuses: failed and in progress. A dump is considered failed if it lacks a \"done\" heartbeat or if the last heartbeat timestamp exceeds 30 minutes. The delete command now supports the following retention parameters:
--dry-run: Runs the deletion operation in test mode with verbose output, without actually deleting anything.
--before-date 2024-08-27T23:50:54+00:00: Deletes dumps older than the specified date. The date must be provided in RFC3339Nano format, for example: 2021-01-01T00:00:00Z.
--retain-recent 10: Retains the N most recent dumps, where N is specified by the user.
--retain-for 1w2d3h4m5s6ms7us8ns: Retains dumps for the specified duration. The format supports weeks (w), days (d), hours (h), minutes (m), seconds (s), milliseconds (ms), microseconds (us), and nanoseconds (ns).
--prune-failed: Prunes (removes) all dumps that have failed.
--prune-unsafe: Prunes dumps with \"unknown-or-failed\" statuses. This option only works in conjunction with --prune-failed.
Docker image mirroring into the GitHub Container Registry
Introduced the Parametrizer interface, now implemented for both dynamic and static parameters.
Renamed most of the toolkit types for enhanced clarity and comprehensive documentation coverage.
Refactored the Driver initialization logic.
Added validation warnings for overridden types in the Driver.
Migrated existing built-in transformers to utilize the new Parametrizer interface.
Implemented a new abstraction, TransformationContext, as the first step towards enabling new feature transformation conditions (#34).
Optimized most transformers for performance in both dynamic and static modes. While dynamic mode offers flexibility, static mode ensures performance remains high. Using only the necessary transformation features helps keep transformation time predictable.
RandomEmail - Introduces a new transformer that supports both random and deterministic engines. It allows for flexible email value generation; you can use column values in the template and choose to keep the original domain or select any from the domains parameter.
NoiseDate, NoiseFloat, NoiseInt - These transformers support both random and deterministic engines, offering dynamic mode parameters that control the noise thresholds within the min and max range. Unlike previous implementations which used a single ratio parameter, the new release features min_ratio and max_ratio parameters to define noise values more precisely. Utilizing the hash engine in these transformers enhances security by complicating statistical analysis for attackers, especially when the same salt is used consistently over long periods.
NoiseNumeric - A newly implemented transformer, sharing features with NoiseInt and NoiseFloat, but specifically designed for numeric values (large integers or floats). It provides a decimal parameter to handle values with fractions.
RandomChoice - Now supports the hash engine
RandomDate, RandomFloat, RandomInt - Now enhanced with hash engine support. Threshold parameters min and max have been updated to support dynamic mode, allowing for more flexible configurations.
RandomNumeric - A new transformer specifically designed for numeric types (large integers or floats), sharing similar features with RandomInt and RandomFloat, but tailored for handling huge numeric values.
RandomString - Now supports hash engine mode
RandomUnixTimestamp - This new transformer generates Unix timestamps with selectable units (second, millisecond, microsecond, nanosecond). Similar in function to RandomDate, it supports the hash engine and dynamic parameters for min and max thresholds, with the ability to override these units using min_unit and max_unit parameters.
RandomUuid - Added hash engine support
RandomPerson - Implemented a new transformer that replaces RandomName, RandomLastName, RandomFirstName, RandomFirstNameMale, RandomFirstNameFemale, RandomTitleMale, and RandomTitleFemale. This new transformer offers enhanced customizability while providing similar functionalities as the previous versions. It generates personal data such as FirstName, LastName, and Title, based on the provided gender parameter, which now supports dynamic mode. Future minor versions will allow for overriding the default names database.
Added tsModify - a new template function for time.Time objects modification
Introduced a new RandomIp transformer capable of generating a random IP address based on the specified netmask.
Added a new RandomMac transformer for generating random Mac addresses.
Deleted transformers include RandomMacAddress, RandomIPv4, RandomIPv6, RandomUnixTime, RandomTitleMale, RandomTitleFemale, RandomFirstName, RandomFirstNameMale, RandomFirstNameFemale, RandomLastName, and RandomName due to the introduction of more flexible and unified options.
"},{"location":"release_notes/greenmask_0_2_0/#fixes-and-improvements","title":"Fixes and improvements","text":"
Fixed validate command with the --table flag, which had the wrong order of the table name representation {{ table_name }}.{{ schema }} instead of {{ schema }}.{{ table_name }}.
Fixed Row.SetColumn out of range validation.
Fixed restoreWorker panic caused when the worker received an error from pgx.
Fixed error handling in the restore command.
Fixed restore jobs now start a transaction for each table restoration and commit it after the table restoration is done.
Fixed --exit-on-error works incorrectly in the restore command. Now, the --exit-on-error flag works correctly with the data section.
Fixed transaction rollback in the validate command.
Fixed typo in documentation.
Fixed a CI/CD bug related to retrieving current tags.
Fixed the Docker image tag for latest to exclude specific keywords.
Fixed a case where the hashing value was not set for each column in the RandomPerson transformer.
Fixed original email value parsing conditions.
Subset docs revision.
Fixes a case where data entries were excluded by exclusion parameters such as --exclude-table, --table, etc.
Fixed zero bytes that were written in the buffer due to the wrong buffer limit in the Email transformer.
Fixed a case where the overridden type of column via columns_type_override did not work.
Fixed a case where an unknown option provided in the config was just ignored instead of throwing an error.
Fixed a case where min and max parameter values were ignored in transformers NoiseDate, NoiseNumeric, NoiseFloat, NoiseInt, RandomNumeric, RandomFloat, and RandomInt.
Fixed TOC entry COPY restoration statement - added missing newline and semicolon. Now backward pg_dump call pg_restore 1724504511561 --file 1724504511561.sql is backward compatible and works as expected.
Fixed a case where dump/restore fails when masking tables with a generated column.
Updated go version (v1.22) and dependencies
Revised installation section of doc
PostgreSQL 17 support - revised ported library to support PostgreSQL 17
Fixed integration tests - reset the go test cache on each iteration
Push docker images to ghcr.io registry
A bunch of refactoring and code cleanup to make the codebase more maintainable and readable.
This major beta release introduces new features and refactored transformers, significantly enhancing Greenmask's flexibility to better meet business needs.
Most transformers now support dynamic parameters where applicable.
Dynamic parameters are strictly enforced. If you need to cast values to another type, Greenmask provides templates and predefined cast functions accessible via cast_to. These functions cover frequent operations such as UnixTimestampToDate and IntToBool.
The transformation logic has been significantly refactored, making transformers more customizable and flexible than before.
Introduced transformation engines
random - generates transformer values based on pseudo-random algorithms.
hash - generates transformer values using hash functions. Currently, it utilizes sha3 hash functions, which are secure but perform slowly. In the stable release, there will be an option to choose between sha3 and SipHash.
Introduced the Parametrizer interface, now implemented for both dynamic and static parameters.
Renamed most of the toolkit types for enhanced clarity and comprehensive documentation coverage.
Refactored the Driver initialization logic.
Added validation warnings for overridden types in the Driver.
Migrated existing built-in transformers to utilize the new Parametrizer interface.
Implemented a new abstraction, TransformationContext, as the first step towards enabling new feature transformation conditions (#34).
Optimized most transformers for performance in both dynamic and static modes. While dynamic mode offers flexibility, static mode ensures performance remains high. Using only the necessary transformation features helps keep transformation time predictable.
RandomEmail - Introduces a new transformer that supports both random and deterministic engines. It allows for flexible email value generation; you can use column values in the template and choose to keep the original domain or select any from the domains parameter.
NoiseDate, NoiseFloat, NoiseInt - These transformers support both random and deterministic engines, offering dynamic mode parameters that control the noise thresholds within the min and max range. Unlike previous implementations which used a single ratio parameter, the new release features min_ratio and max_ratio parameters to define noise values more precisely. Utilizing the hash engine in these transformers enhances security by complicating statistical analysis for attackers, especially when the same salt is used consistently over long periods.
NoiseNumeric - A newly implemented transformer, sharing features with NoiseInt and NoiseFloat, but specifically designed for numeric values (large integers or floats). It provides a decimal parameter to handle values with fractions.
RandomChoice - Now supports the hash engine
RandomDate, RandomFloat, RandomInt - Now enhanced with hash engine support. Threshold parameters min and max have been updated to support dynamic mode, allowing for more flexible configurations.
RandomNumeric - A new transformer specifically designed for numeric types (large integers or floats), sharing similar features with RandomInt and RandomFloat, but tailored for handling huge numeric values.
RandomString - Now supports hash engine mode
RandomUnixTimestamp - This new transformer generates Unix timestamps with selectable units (second, millisecond, microsecond, nanosecond). Similar in function to RandomDate, it supports the hash engine and dynamic parameters for min and max thresholds, with the ability to override these units using min_unit and max_unit parameters.
RandomUuid - Added hash engine support
RandomPerson - Implemented a new transformer that replaces RandomName, RandomLastName, RandomFirstName, RandomFirstNameMale, RandomFirstNameFemale, RandomTitleMale, and RandomTitleFemale. This new transformer offers enhanced customizability while providing similar functionalities as the previous versions. It generates personal data such as FirstName, LastName, and Title, based on the provided gender parameter, which now supports dynamic mode. Future minor versions will allow for overriding the default names database.
Added tsModify - a new template function for time.Time objects modification
Introduced a new RandomIp transformer capable of generating a random IP address based on the specified netmask.
Added a new RandomMac transformer for generating random Mac addresses.
Deleted transformers include RandomMacAddress, RandomIPv4, RandomIPv6, RandomUnixTime, RandomTitleMale, RandomTitleFemale, RandomFirstName, RandomFirstNameMale, RandomFirstNameFemale, RandomLastName, and RandomName due to the introduction of more flexible and unified options.
"},{"location":"release_notes/greenmask_0_2_0_b1/#full-changelog-v0114v020b1","title":"Full Changelog: v0.1.14...v0.2.0b1","text":""},{"location":"release_notes/greenmask_0_2_0_b1/#playground-usage-for-beta-version","title":"Playground usage for beta version","text":"
If you want to run a Greenmask playground for the beta version v0.2.0b1 execute:
git checkout tags/v0.2.0b1 -b v0.2.0b1\ndocker-compose run greenmask-from-source\n
This major beta release introduces new features such as the database subset, pgzip support, restoration in topological and many more. It also includes fixes and improvements.
This release is a major milestone that significantly expands Greenmask's functionality, transforming it into a simple, extensible, and reliable solution for database security, data anonymization, and everyday operations. Our goal is to create a core system that can serve as a foundation for comprehensive dynamic staging environments and robust data security.
Database Subset - a new feature that allows you to define a subset of the database, allowing you to scale down the dump size (#110). This is robust for multipurpose and especially useful for testing and development environments. It supports:
References with NULL values - generate the LEFT JOIN query for the FK reference with NULL values to include them in the subset.
Supports virtual references (virtual foreign keys) - create a logical FK in Greenmask that will be used for subset dependencies graph. The virtual reference can be defined for a column or an expression, allowing you to get the value from JSON and similar.
Supports circular references - Greenmask will automatically resolve circular dependencies in the subset by generating a recursive query. The query is generated with integrity checks of the subset ensuring that the data gathered from circular dependencies is consistent.
Fully covered with documentation including troubleshooting and examples.
Supports FK and PK that have more than one column (or expression).
Multi-cycles resolution in one strong connected component (SCC) is supported - Greenmask will generate a recursive query for the SCC whether it is a single cycle or multiple cycles, making the subset system universal for any database schema.
pgzip support for faster compression and decompression \u2014 setting --pgzip can speed up the dump and restoration processes through parallel compression. In some tests, it shows up to 5x faster dump and restore operations.
Restoration in topological order - This flag ensures that dependent tables are not restored until the tables they depend on have been restored. This is useful when you want to be notified of errors as immediately as possible without waiting for the entire table to be restored.
Insert format restoration - For a flexible restoration process, Greenmask now supports data restoration in the INSERT format. It generates the insert statements based on COPY records from the dump. You do not need to re-dump your data to use this feature; it can be defined in the restore command. The list of new features related to the INSERT format:
Generate INSERT statements with the **ON CONFLICT DO NOTHING** clause if the flag --on-conflict-do-nothing is set.
Error exclusion list in the config to skip certain errors and continue inserting subsequent rows from the dump.
Use cases - incremental dump and restoration for logical data. For example, if you have a database, and you want to insert data periodically from another source, this can be used together with the database subset and transformations to catch up the target database.
Restore data batching (#173) - By default, the COPY protocol returns the error only on transaction commit. To override this behavior, use the --batch-size flag to specify the number of rows to insert in a single batch during the COPY command. This is useful when you want to control the transaction size and commit.
Introduced keep_null parameter for RandomPerson transformer.
"},{"location":"release_notes/greenmask_0_2_0_b2/#fixes-and-improvements","title":"Fixes and improvements","text":"
Fixed validate command with the --table flag, which had the wrong order of the table name representation {{ table_name }}.{{ schema }} instead of {{ schema }}.{{ table_name }}.
Fixed Row.SetColumn out of range validation.
Fixed restoreWorker panic caused when the worker received an error from pgx.
Fixed error handling in the restore command.
Fixed restore jobs now start a transaction for each table restoration and commit it after the table restoration is done.
Fixed --exit-on-error works incorrectly in the restore command. Now, the --exit-on-error flag works correctly with the data section.
Fixed transaction rollback in the validate command.
Fixed typo in documentation.
Fixed a CI/CD bug related to retrieving current tags.
Fixed the Docker image tag for latest to exclude specific keywords.
Fixed a case where the hashing value was not set for each column in the RandomPerson transformer.
Fixed original email value parsing conditions.
Subset docs revision.
Fixes a case where data entries were excluded by exclusion parameters such as --exclude-table, --table, etc.
Fixed zero bytes that were written in the buffer due to the wrong buffer limit in the Email transformer.
Fixed a case where the overridden type of column via columns_type_override did not work.
Fixed a case where an unknown option provided in the config was just ignored instead of throwing an error.
Fixed a case where min and max parameter values were ignored in transformers NoiseDate, NoiseNumeric, NoiseFloat, NoiseInt, RandomNumeric, RandomFloat, and RandomInt.
Fixed TOC entry COPY restoration statement - added missing newline and semicolon. Now backward pg_dump call pg_restore 1724504511561 --file 1724504511561.sql is backward compatible and works as expected.
Fixed a case where dump/restore fails when masking tables with a generated column.
Updated go version (v1.22) and dependencies
Revised installation section of doc
A bunch of refactoring and code cleanup to make the codebase more maintainable and readable.
"},{"location":"release_notes/greenmask_0_2_0_b2/#full-changelog-v020b1v020b2","title":"Full Changelog: v0.2.0b1...v0.2.0b2","text":""},{"location":"release_notes/greenmask_0_2_0_b2/#playground-usage-for-beta-version","title":"Playground usage for beta version","text":"
If you want to run a Greenmask playground for the beta version v0.2.0b2 execute:
git checkout tags/v0.2.0b2 -b v0.2.0b2\ndocker-compose run greenmask-from-source\n
This release introduces two new features transformation conditions and transformation inheritance for primary and foreign keys. It also includes several bug fixes and improvements.
Fixed an issue where the partitioned table itself was executed in the restore worker, resulting in a \"file not found\" error in storage. Closes bug: restoring partitioned tables fails #238 #242.
Fixed template function availability #239. Renamed methods according to the documentation: GetColumnRawValue is now GetRawColumnValue, and SetColumnRawValue is now SetRawColumnValue #242
Resolved an issue where Dump.createTocEntries processed partitioned tables as if they were physical entities, despite being logical #241
Corrected merging in the pre-data, data, and post-data sections, which previously caused a panic in dump command when the post-data section was excluded #241
Fixed an issue where dumps created with --load-via-partition-root did not use the root partition table in --inserts generation during restoration #241
Introduces --disable-trigers, --use-session-replication-role-replica and --superuser options for restore command. It allows to disable triggers during data section restore #248. Closes feature request #228
Fix skipping unknown type when silent is true #251
Feel free to reach out to us if you have any questions or need assistance:
Greenmask Roadmap
Email
Twitter
Telegram
Discord
DockerHub
"}]}
\ No newline at end of file
diff --git a/dev/sitemap.xml b/dev/sitemap.xml
index 01ac9daf..ade61783 100644
--- a/dev/sitemap.xml
+++ b/dev/sitemap.xml
@@ -400,4 +400,8 @@
https://docs.greenmask.io/dev/release_notes/greenmask_0_2_5/2024-12-07
+
+ https://docs.greenmask.io/dev/release_notes/greenmask_0_2_6/
+ 2024-12-07
+
\ No newline at end of file
diff --git a/dev/sitemap.xml.gz b/dev/sitemap.xml.gz
index 5d3f7a66..758fe5de 100644
Binary files a/dev/sitemap.xml.gz and b/dev/sitemap.xml.gz differ