Skip to content

Data Converters

guvra edited this page Mar 26, 2024 · 4 revisions

This section lists all data converters that can be specified in the YAML configuration file.

Randomizers

These converters replace parts of the input value with random characters. For example, the randomizeNumber converter replaces all numeric characters with random numbers.

Only non-empty values are processed.

Converts all characters to random alphanumeric characters.

For example, one of the possible convertions for "john_doe" is "vO7s2pJx".

Parameters:

Name Required Default Description
min_length N 3 The minimum length of the generated value (when not empty).
replacements N Check here A string that contains the replacement characters.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'randomizeText'

Applies the following transformations on the input value:

  • Applies the randomizeText converter on the username part.
  • Replaces the domain (if any) by a safe one.

For example, one of the possible conversions for "[email protected]" is "[email protected]".

Parameters:

Name Required Default Description
domains N ['example.com', 'example.net', 'example.org'] A list of email domains.
min_length N 3 The minimum length of the generated username (when not empty).
replacements N Check here A string that contains the replacement characters.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'randomizeEmail'

Converts all numeric characters to random numbers. Other characters are not modified.

For example, one of the possible conversions for "number_123456" is "number_086714"

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'randomizeNumber'

Anonymizers

These converters anonymize an input value. Empty values are not converted.

Anonymizes string values by replacing all characters with the * character. The first letter of each word is preserved. The default word separators are (space), _ (underscore) and . (dot).

For example, it converts "John Doe" to "J*** D**".

Parameters:

Name Required Default Description
replacement N '*' The replacement character.
delimiters N [' ', '_', '-', .'] The word separator characters.
min_word_length N 3 The minimum length per anonymized word. Useful only if at least one word separator is defined.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'anonymizeText'

Applies the following transformations on the input value:

  • Applies the anonymizeText converter on the username part.
  • Replaces the domain (if any) by a safe one.

For example, one of the possible conversions for "[email protected]" is "u****@example.org".

Parameters:

Name Required Default Description
domains N ['example.com', 'example.net', 'example.org'] A list of email domains.
replacement N '*' The replacement character.
delimiters N [' ', '_', '-', '.'] The word separator characters.
min_word_length N 3 The minimum length per anonymized word. Useful only if at least one word delimiter is defined.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'anonymizeEmail'

Anonymizes numeric values by replacing all numbers with the * character. The first digit of each number is preserved.

For example, it converts "user123" to "user1**".

Name Required Default Description
replacement N '*' The replacement character.
min_number_length N 1 The minimum length per anonymized number (when not empty).

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'anonymizeNumber'

Anonymizes date values. It can be used to anonymize a date of birth.

The day and month are randomized. The year is not changed. For example, one of the possible conversions for "1990-01-01" is "1990-11-25".

The date format of the input value MUST match the format parameter, otherwise an exception is thrown.

Parameters:

Name Required Default Description
format N 'Y-m-d' The date format.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'anonymizeDate'

Same as anonymizeDate, but the default value of the format parameter is Y-m-d H:i:s instead of Y-m-d.

Parameters:

Name Required Default Description
format N 'Y-m-d H:i:s' The date format.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'anonymizeDateTime'

Generators

These converters generate random values.

Generates a random text value.

Parameters:

Name Required Default Description
min_length N 3 The minimum length of the generated value.
max_length N 16 The minimum length of the generated value.
characters N Check here A string that contains the characters used to generate the value.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'randomText'
                parameters:
                    min_length: 0
                    max_length: 10

Generates a random email address. The username part of the email is generated with the randomText converter.

Parameters:

Name Required Default Description
domains N ['example.com', 'example.net', 'example.org'] A list of email domains.
min_length N 3 The minimum length of the username.
max_length N 16 The minimum length of the username.
characters N Check here A string that contains the characters used to generate the username.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'randomEmail'

Generates a random date (e.g. 2005-08-03).

Parameters:

Name Required Default Description
format N 'Y-m-d' The date format.
min_year N 1900 The min year. If set to null, the min year is the current year.
max_year N null The max year. If set to null, the max year is the current year.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'randomDate'
                parameters:
                    min_year: 2000
                    max_year: 2050

Same as randomDate, but the default value of the format parameter is Y-m-d H:i:s instead of Y-m-d.

Parameters:

Name Required Default Description
format N 'Y-m-d' The date format.
min_year N 1900 The min year. If set to null, the min year is the current year.
max_year N null The max year. If set to null, the max year is the current year.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'randomDateTime'
                parameters:
                    min_year: 2000
                    max_year: 2050

Generates a number between a min and a max value.

Parameters:

Name Required Default Description
min Y The min value.
max Y The max value.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'numberBetween'
                parameters:
                    min: 0
                    max: 100

Converts all values to null.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'setNull'

This converter always returns the same value.

Parameters:

Name Required Default Description
value Y The value to set. It must be a scalar (string, int, float, boolean) or null.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'setValue'
                parameters:
                    value: 0

Transformers

These converters apply transformations on the input value (e.g. converting to lower case). Empty values are not converted.

Converts all characters to lower case.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'toLower'

Converts all characters to upper case.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'toUpper'

This converter adds a prefix to every value.

For example, the value user1 is converted to test_user1 if the prefix is test_.

Parameters:

Name Required Default Description
value Y The value to prepend.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'prependText'
                parameters:
                    value: 'test_'

This converter adds a suffix to every value.

For example, the value user1 is converted to user1_test if the suffix is _test.

Parameters:

Name Required Default Description
value Y The value to append.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'appendText'
                parameters:
                    value: '_test'

This converter replaces all occurrences of the search string with the replacement string.

Parameters:

Name Required Default Description
search Y The text to replace.
replacement N '' The replacement text.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'replace'
                parameters:
                    search: 'bar'
                    replacement: 'baz'

This converter performs a regular expression search and replace.

Parameters:

Name Required Default Description
pattern Y The pattern to find.
replacement N '' The replacement text.
limit N -1 The max number of replacements to perform. No limit if set to -1 (default value).

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'regexReplace'
                parameters:
                    pattern: '/[0-9]+/'
                    replacement: '15'

This converter applies a hash algorithm on the value.

The default algorithm is sha1.

Any algorithm returned by the function hash_algos can be used. Examples: md5, sha1, sha256, sha512, crc32.

Parameters:

Name Required Default Description
algorithm Y 'sha1' The algorithm to use.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'hash'
                parameters:
                    algorithm: 'sha256'

Advanced Converters

Allows to use any formatter defined in the Faker library.

Parameters:

Name Required Default Description
formatter Y The formatter name.
arguments N [] The formatter arguments.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'faker'
                parameters:
                    formatter: 'numberBetween'
                    arguments: [1, 100]

To use a formatter that requires the original value as an argument, you can use the {{value}} placeholder:

tables:
    my_table:
        converters:
            my_column:
                converter: 'faker'
                parameters:
                    formatter: 'shuffle'
                    arguments: ['{{value}}']

The faker locale can be set in the configuration file and defaults to en_US.

This converter executes a list of converters.

Parameters:

Name Required Default Description
converters Y A list of converter definitions.

Example:

tables:
    my_table:
        converters:
            my_column:
                converter: 'chain'
                parameters:
                    converters:
                        - converter: 'anonymizeText'
                          condition: '{{another_column}} == 0'
                        - converter: 'randomizeText'
                          condition: '{{another_column}} == 1'

If you need to override a chained converter defined in a parent config file, you must specify the key index. For example, to disable the 2nd converter of a chain:

tables:
    my_table:
        converters:
            my_column:
                parameters:
                    converters:
                        1:
                            disabled: true

This converter can be used to anonymize data that are stored in a JSON object.

Parameters:

Name Required Default Description
converters Y A list of converter definitions. The key of each converter definition is the path to the value within the JSON object.

For example, if the following JSON data is stored in a column:

{"customer":{"email":"[email protected]","username":"john.doe"}}

The following converter can be used:

tables:
    my_table:
        converters:
            my_column:
                converter: 'jsonData'
                parameters:
                    converters:
                        customer.email:
                            converter: 'anonymizeEmail'
                        customer.username:
                            converter: 'anonymizeText'

Same as jsonData converter, but works with serialized data instead.

The serialized data must be an array.

This converter returns a value from the $context array passed to converters.

The context array contains the following data:

  • row_data: an array containing the value of each column of the table row
  • processed_data: an array containing the values of the row that were transformed by a converter

Parameters:

Name Required Default Description
key Y The key associated to the value to retrieve in the context array.

Example:

tables:
    my_table:
        converters:
            email:
                converter: 'randomizeEmail'
            email_lowercase:
                converter: 'chain'
                parameters:
                    converters:
                        - converter: 'fromContext'
                          parameters:
                              key: 'processed_data.email'
                        - converter: 'toLower'