Skip to content

Configuration File

guvra edited this page Jun 11, 2024 · 12 revisions

This section describes the format of the YAML configuration file.

Database Settings

The database connection parameters must be specified in the database object. Example with environment variables:

database:
    host: '%env(DB_HOST)%'
    user: '%env(DB_USER)%'
    password: '%env(DB_PASSWORD)%'
    name: '%env(DB_NAME)%'

Only the name parameter is required. Other parameters are optional.

Available parameters:

Parameter Required Type Default Description
name Y string Database name.
user N string 'root' Database user.
password N string Database password.
host N string 'localhost' Database host.
port N int Database port.
charset N string Charset to use.
unix_socket N string Name of the socket to use.
driver N string 'pdo_mysql' Database driver. Only pdo_mysql is supported as of now.
driver_options object string {} A list of PDO settings.

Dump Settings

Dump settings are all optional.

Example:

dump:
    output: 'my_dump_file-{Y-m-d H:i:s}.sql.gz'
    compress: 'gzip'

If no output was provided, the dump is outputted to the default stdout.

Available settings:

Parameter Type Default Description
output string 'php://stdout' Dump output. If a relative path is specified, it is relative to the current working directory.

A date format can be specified using curly brackets, e.g. {Y-m-d}.
add_drop_database bool false MySQL documentation
add_drop_table bool true MySQL documentation
add_drop_trigger bool true MySQL documentation
add_locks bool true MySQL documentation
complete_insert bool false MySQL documentation
compress string 'none' none, gzip (.gz file extension), bzip2 (.bz2 file extension).
default_character_set string 'utf8' utf8 (default, compatible option), utf8mb4 (for full utf8 compliance).
disable_keys bool true MySQL documentation
events bool false MySQL documentation
extended_insert bool true MySQL documentation
hex_blob bool false MySQL documentation
init_commands array [] Queries executed after the connection is established.
insert_ignore bool false MySQL documentation
lock_tables bool false MySQL documentation
net_buffer_length int 1000000 MySQL documentation
no_autocommit bool true Option to disable autocommit (faster inserts, no problems with index keys).
no_create_info bool false MySQL documentation
routines bool false MySQL documentation
single_transaction bool true MySQL documentation
skip_comments bool false MySQL documentation
skip_definer bool false MySQL documentation
skip_dump_date bool false MySQL documentation
skip_triggers bool false MySQL documentation
skip_tz_utc bool false MySQL documentation

Table Whitelist/Blacklist

To define a table whitelist (only these tables will be dumped):

tables_whitelist:
    - 'table1'
    - 'table2'

To define a table blacklist (these tables won't be dumped):

tables_blacklist:
    - 'table1'
    - 'table2'

The wildcard character * can be used in table names (e.g. cache_*).

Table Filters

Filtering Values

The configuration of each table must be specified in the tables parameter.

tables:
    table1:
        # ...
    table2:
        # ...

The wildcard character * can be used in table names (e.g. cache_*).

Available properties:

Property Type Default Description
truncate bool false whether to dump a table without any data.
limit int max number of rows to dump (must be greater than 0, otherwise it is ignored).
order_by string '' same as SQL (e.g. name asc, id desc).
where string '' a where condition (e.g. email like "%@acme.com").

How to define a truncate:

tables:
    my_table:
        truncate: true

How to define a limit:

tables:
    my_table:
        limit: 10000

How to define a sort order:

tables:
    my_table:
        order_by: 'sku, entity_id desc'

How to define a where condition:

tables:
    my_table:
        where: 'email like "%@acme.com" or created_at > date_sub(now(), interval 55 day)'

Filter Propagation

By default, table filters are automatically propagated to all table dependencies (by following foreign keys).

This feature can be disabled by adding the following configuration at the root of the config file:

filter_propagation:
    enabled: false

In some very specific cases, you might want to disable filter propagation for some foreign keys. This can be achieved with the following configuration:

filter_propagation:
    ignored_foreign_keys:
        - 'FK_CONSTRAINT_NAME'

Data Converters

Declaring Converters

It is possible to define data converters for any column.

Syntax:

tables:
    my_table: # table name
        converters:
            my_column: # column name
                converter: 'randomizeText'

Only non-null values are converted. Null values are not processed by converters.

List of available properties:

Property Required Type Default Description
converter Y string The identifier of a data converter.
condition N string '' A PHP expression that must evaluate to true or false. When a condition is set, the value is converted only if the expression evaluates to true.
parameters N array {} e.g. min and max for numberBetween. Most converters don't accept any parameter.
unique N bool false Whether to generate only unique values. May result in a fatal error with converters that can't generate enough unique values.
cache_key N string '' The generated value will be used by all converters that use the same cache key.
disabled N bool false Can be used to disable a converter declared in a parent config file.

How to use parameters:

tables:
    my_table:
        converters:
            my_column:
                converter: 'randomizeEmail'
                parameters:
                    domains: ['example.org']

How to define a condition:

tables:
    my_table:
        converters:
            my_column:
                converter: 'randomizeEmail'
                condition: '{{another_column}} !== null'

The condition must be a PHP expression. Column values can be specified with double brackets. For example, if the table has a id column, the {{id}} variable will be available.

Using Multiple Converters

The chain converter allows to use multiple converters for a given column. For example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'chain'
                parameters:
                    converters:
                        - converter: 'anonymizeText'
                          condition: '{{another_column}} == 0'
                        - converter: 'randomizeText'
                          condition: '{{another_column}} == 1'

Skipping Data Conversion

It is possible to skip data conversion for an entire table row:

tables:
    my_table:
        skip_conversion_if: 'strpos({{email}}, "@acme.fr") !== false'

The syntax is the same as the converter conditions. If the condition evaluates to true, the table row will be dumped as-is, without any data conversion.

Sharing Converter Results

The cache_key parameter can be used to share values between converters.

For example, to generate the same anonymized email in two tables:

tables:
    customer_entity:
        converters:
            email:
                converter: 'randomizeEmail'
                cache_key: 'customer_email'
                unique: true
tables:
    newsletter_subscriber:
        converters:
            subscriber_email:
                converter: 'randomizeEmail'
                cache_key: 'customer_email'
                unique: true

Notes:

  • If you use the unique parameter, it must be specified in all converters that share the same cache key. If the parameter is missing somewhere, it can result in a infinite loop situation.
  • This feature is not used in the default templates (magento2, ...), because it may require a lot of memory, depending on the size of the tables.

Templates

Default Templates

The tool is bundled with predefined configuration templates. Each template provides anonymization rules for a specific framework.

Available templates:

To use one of these template, you must specify its name with the extends parameter. The magento2 template also requires to specify the version of your application. For example:

extends: 'magento2'
version: '2.4.6'

Custom Templates

The extends parameter can also be used with custom config files:

extends: 'path/to/config.yaml'

The contents of this template will automatically be merged with the configuration file. The path to the file can be an absolute path, or relative to the current configuration file.

Using Multiple Templates

It is possible to use multiple templates:

extends:
    - 'path/to/config1.yaml'
    - 'path/to/config2.yaml'

Advanced Configuration

Environment Variables

You can use environment variables with the following syntax:

database:
    host: '%env(DB_HOST)%'
    user: '%env(DB_USER)%'
    password: '%env(DB_PASSWORD)%'
    name: '%env(DB_NAME)%'

You can also set the variable type with the following syntax:

tables:
    cache:
        truncate: '%env(bool:TRUNCATE_CACHE_TABLE)%'

Available types: string (default), bool, int, float, json.

The JSON type can be used to define array values. For example:

tables_blacklist: '%env(json:TABLES_BLACKLIST)%'

Example value of the environment variable: ["table1", "table2", "table3"].

SQL Variables

It is possible to store SQL query results in user-defined variables:

variables:
    firstname_attribute_id: 'select attribute_id from eav_attribute where attribute_code = "firstname" and entity_type_id = 1'
    lastname_attribute_id: 'select attribute_id from eav_attribute where attribute_code = "lastname" and entity_type_id = 1'

It can then be used in table filters and converter conditions.

How to use variables in table filters:

tables:
    my_table:
        where: 'attribute_id = @firstname_attribute_id'

How to use variables in converter conditions:

tables:
    customer_entity_varchar:
        converters:
            value:
                converter: 'anonymizeText'
                condition: '{{attribute_id}} == @firstname_attribute_id'

Faker Locale

By default, the locale used in faker formatters is en_US. It can be changed with the following setting:

faker:
    locale: 'de_DE'

Warning: the default phar distribution only includes the "en_US" locale. To use other locales with the phar, you must compile your own phar file that includes the required locales.

Unsetting Values Declared in Config Templates

It is possible to unset values that were declared in a parent config file, by setting them to null.

Warning: setting a value to null is only allowed if it is already defined in a parent config file.

Example - removing the whole config of a table (converters, filters...):

extends: 'magento2'
tables:
    admin_user: ~

Example - removing all converters of a table:

extends: 'magento2'
tables:
    admin_user:
        converters: ~

Example - removing a specific converter:

extends: 'magento2'
tables:
    admin_user:
        converters:
            email: ~

Alternatively, converters can be disabled by setting the disabled parameter to true:

extends: 'magento2'
tables:
    admin_user:
        converters:
            email:
                disabled: true