-
Notifications
You must be signed in to change notification settings - Fork 47
Configuration File
This section describes the format of the YAML configuration file.
The database connection parameters must be specified in the database
object.
Example with environment variables:
database:
host: '%env(DB_HOST)%'
user: '%env(DB_USER)%'
password: '%env(DB_PASSWORD)%'
name: '%env(DB_NAME)%'
Only the name
parameter is required.
Other parameters are optional.
Available parameters:
Parameter | Required | Type | Default | Description |
---|---|---|---|---|
name | Y | string | Database name. | |
user | N | string | 'root' |
Database user. |
password | N | string | Database password. | |
host | N | string | 'localhost' |
Database host. |
port | N | int | Database port. | |
charset | N | string | Charset to use. | |
unix_socket | N | string | Name of the socket to use. | |
driver | N | string | 'pdo_mysql' |
Database driver. Only pdo_mysql is supported as of now. |
driver_options | object | string | {} |
A list of PDO settings. |
Dump settings are all optional.
Example:
dump:
output: 'my_dump_file-{Y-m-d H:i:s}.sql.gz'
compress: 'gzip'
If no output was provided, the dump is outputted to the default stdout.
Available settings:
Parameter | Type | Default | Description |
---|---|---|---|
output | string | 'php://stdout' |
Dump output. If a relative path is specified, it is relative to the current working directory. A date format can be specified using curly brackets, e.g. {Y-m-d} . |
add_drop_database | bool | false |
MySQL documentation |
add_drop_table | bool | true |
MySQL documentation |
add_drop_trigger | bool | true |
MySQL documentation |
add_locks | bool | true |
MySQL documentation |
complete_insert | bool | false |
MySQL documentation |
compress | string | 'none' |
none , gzip (.gz file extension), bzip2 (.bz2 file extension). |
default_character_set | string | 'utf8' |
utf8 (default, compatible option), utf8mb4 (for full utf8 compliance). |
disable_keys | bool | true |
MySQL documentation |
events | bool | false |
MySQL documentation |
extended_insert | bool | true |
MySQL documentation |
hex_blob | bool | false |
MySQL documentation |
init_commands | array | [] |
Queries executed after the connection is established. |
insert_ignore | bool | false |
MySQL documentation |
lock_tables | bool | false |
MySQL documentation |
net_buffer_length | int | 1000000 |
MySQL documentation |
no_autocommit | bool | true |
Option to disable autocommit (faster inserts, no problems with index keys). |
no_create_info | bool | false |
MySQL documentation |
routines | bool | false |
MySQL documentation |
single_transaction | bool | true |
MySQL documentation |
skip_comments | bool | false |
MySQL documentation |
skip_definer | bool | false |
MySQL documentation |
skip_dump_date | bool | false |
MySQL documentation |
skip_triggers | bool | false |
MySQL documentation |
skip_tz_utc | bool | false |
MySQL documentation |
To define a table whitelist (only these tables will be dumped):
tables_whitelist:
- 'table1'
- 'table2'
To define a table blacklist (these tables won't be dumped):
tables_blacklist:
- 'table1'
- 'table2'
The wildcard character *
can be used in table names (e.g. cache_*
).
The configuration of each table must be specified in the tables
parameter.
tables:
table1:
# ...
table2:
# ...
The wildcard character *
can be used in table names (e.g. cache_*
).
Available properties:
Property | Type | Default | Description |
---|---|---|---|
truncate | bool | false | whether to dump a table without any data. |
limit | int | max number of rows to dump (must be greater than 0, otherwise it is ignored). | |
order_by | string | '' |
same as SQL (e.g. name asc, id desc ). |
where | string | '' |
a where condition (e.g. email like "%@acme.com" ). |
How to define a truncate:
tables:
my_table:
truncate: true
How to define a limit:
tables:
my_table:
limit: 10000
How to define a sort order:
tables:
my_table:
order_by: 'sku, entity_id desc'
How to define a where condition:
tables:
my_table:
where: 'email like "%@acme.com" or created_at > date_sub(now(), interval 55 day)'
By default, table filters are automatically propagated to all table dependencies (by following foreign keys).
This feature can be disabled by adding the following configuration at the root of the config file:
filter_propagation:
enabled: false
In some very specific cases, you might want to disable filter propagation for some foreign keys. This can be achieved with the following configuration:
filter_propagation:
ignored_foreign_keys:
- 'FK_CONSTRAINT_NAME'
It is possible to define data converters for any column.
Syntax:
tables:
my_table: # table name
converters:
my_column: # column name
converter: 'randomizeText'
Only non-null values are converted. Null values are not processed by converters.
List of available properties:
Property | Required | Type | Default | Description |
---|---|---|---|---|
converter | Y | string | The identifier of a data converter. | |
condition | N | string | '' |
A PHP expression that must evaluate to true or false . When a condition is set, the value is converted only if the expression evaluates to true . |
parameters | N | array | {} |
e.g. min and max for numberBetween . Most converters don't accept any parameter. |
unique | N | bool | false |
Whether to generate only unique values. May result in a fatal error with converters that can't generate enough unique values. |
cache_key | N | string | '' |
The generated value will be used by all converters that use the same cache key. |
disabled | N | bool | false |
Can be used to disable a converter declared in a parent config file. |
How to use parameters:
tables:
my_table:
converters:
my_column:
converter: 'randomizeEmail'
parameters:
domains: ['example.org']
How to define a condition:
tables:
my_table:
converters:
my_column:
converter: 'randomizeEmail'
condition: '{{another_column}} !== null'
The condition must be a PHP expression.
Column values can be specified with double brackets.
For example, if the table has a id
column, the {{id}}
variable will be available.
The chain converter allows to use multiple converters for a given column. For example:
tables:
my_table:
converters:
my_column:
converter: 'chain'
parameters:
converters:
- converter: 'anonymizeText'
condition: '{{another_column}} == 0'
- converter: 'randomizeText'
condition: '{{another_column}} == 1'
It is possible to skip data conversion for an entire table row:
tables:
my_table:
skip_conversion_if: 'strpos({{email}}, "@acme.fr") !== false'
The syntax is the same as the converter conditions. If the condition evaluates to true, the table row will be dumped as-is, without any data conversion.
The cache_key
parameter can be used to share values between converters.
For example, to generate the same anonymized email in two tables:
tables:
customer_entity:
converters:
email:
converter: 'randomizeEmail'
cache_key: 'customer_email'
unique: true
tables:
newsletter_subscriber:
converters:
subscriber_email:
converter: 'randomizeEmail'
cache_key: 'customer_email'
unique: true
Notes:
- If you use the
unique
parameter, it must be specified in all converters that share the same cache key. If the parameter is missing somewhere, it can result in a infinite loop situation. - This feature is not used in the default templates (
magento2
, ...), because it may require a lot of memory, depending on the size of the tables.
The tool is bundled with predefined configuration templates. Each template provides anonymization rules for a specific framework.
Available templates:
To use one of these template, you must specify its name with the extends
parameter.
The magento2 template also requires to specify the version of your application.
For example:
extends: 'magento2'
version: '2.4.6'
The extends
parameter can also be used with custom config files:
extends: 'path/to/config.yaml'
The contents of this template will automatically be merged with the configuration file. The path to the file can be an absolute path, or relative to the current configuration file.
It is possible to use multiple templates:
extends:
- 'path/to/config1.yaml'
- 'path/to/config2.yaml'
You can use environment variables with the following syntax:
database:
host: '%env(DB_HOST)%'
user: '%env(DB_USER)%'
password: '%env(DB_PASSWORD)%'
name: '%env(DB_NAME)%'
You can also set the variable type with the following syntax:
tables:
cache:
truncate: '%env(bool:TRUNCATE_CACHE_TABLE)%'
Available types: string (default), bool, int, float, json.
The JSON type can be used to define array values. For example:
tables_blacklist: '%env(json:TABLES_BLACKLIST)%'
Example value of the environment variable: ["table1", "table2", "table3"]
.
It is possible to store SQL query results in user-defined variables:
variables:
firstname_attribute_id: 'select attribute_id from eav_attribute where attribute_code = "firstname" and entity_type_id = 1'
lastname_attribute_id: 'select attribute_id from eav_attribute where attribute_code = "lastname" and entity_type_id = 1'
It can then be used in table filters and converter conditions.
How to use variables in table filters:
tables:
my_table:
where: 'attribute_id = @firstname_attribute_id'
How to use variables in converter conditions:
tables:
customer_entity_varchar:
converters:
value:
converter: 'anonymizeText'
condition: '{{attribute_id}} == @firstname_attribute_id'
By default, the locale used in faker formatters is en_US
.
It can be changed with the following setting:
faker:
locale: 'de_DE'
Warning: the default phar distribution only includes the "en_US" locale. To use other locales with the phar, you must compile your own phar file that includes the required locales.
It is possible to unset values that were declared in a parent config file, by setting them to null
.
Warning: setting a value to null
is only allowed if it is already defined in a parent config file.
Example - removing the whole config of a table (converters, filters...):
extends: 'magento2'
tables:
admin_user: ~
Example - removing all converters of a table:
extends: 'magento2'
tables:
admin_user:
converters: ~
Example - removing a specific converter:
extends: 'magento2'
tables:
admin_user:
converters:
email: ~
Alternatively, converters can be disabled by setting the disabled
parameter to true
:
extends: 'magento2'
tables:
admin_user:
converters:
email:
disabled: true