Skip to content

Guidelines

guvra edited this page Jul 30, 2024 · 5 revisions

Migration Guidelines

Upgrading to 5.0

Starting from version 5.0, the command-line results in an error if the configuration file contains a column that is not defined in the database.

Upgrading to 4.0

No breaking changes, but since version 4.2.0, the parameter filters is deprecated. It will be removed in the next major version. Use the parameter where instead.

Upgrading to 3.0

Breaking changes:

  • The following converters were renamed:
    • randomizeDate -> randomDate
    • randomizeDateTime -> randomDateTime
    • addPrefix -> prependText
    • addSuffix -> appendText
  • The orderBy parameter was renamed to order_by. This was the only parameter that didn't use snake case.

Performance

Since this tool is a pure PHP implementation of a MySQL dumper, it is slower than mysqldump.

If the database to dump has very large tables, it is recommended to use the table filter mechanism.

Security

If you want to share your configuration file, don't include the database credentials. Instead, use environment variables.

For example:

database:
    host: '%env(DB_HOST)%'
    user: '%env(DB_USER)%'
    password: '%env(DB_PASSWORD)%'
    name: '%env(DB_NAME)%'

Anonymization

If your database contains personal data, you can use converters to anonymize the data written to the dump file.

Example of personal data:

  • email
  • username
  • name
  • date of birth
  • phone number
  • address
  • IP address
  • encrypted password
  • payment data
  • comment that could contain customer-related information

Data Consistency

If you use one of the config templates bundled with this tool (e.g. magento2), the anonymized data is not consistent across tables. For example, the anonymized customer email won't have the same value between the customer table and the quote table.

You can add data consistency by specifying a cache key. For example, in Magento 2:

tables:
    customer_entity:
        converters:
            email:
                cache_key: 'customer_email'
                unique: true

    customer_flat_grid:
        converters:
            email:
                cache_key: 'customer_email'
                unique: true

    # ... repeat this for each table that stores a customer email

With the above configuration, each table will use the same anonymized email for each customer.

Warning: this consumes a lot of memory (approximately 1G for 10 million values).

Magento

Performance

In the magento templates, quote tables are not truncated by default. If these tables contain a lot of values, adding filters to these tables will speed up the dump creation.

For example (Magento 2):

tables:
    quote:
        truncate: true

Admin Accounts

The magento1 and magento2 templates anonymize all admin accounts.

If you want to keep the email/password for some accounts, you can set a condition on the admin_user table.

Example:

tables:
    admin_user:
        skip_conversion_if: '{{username}} === "admin123"'

Payment Data

In Magento 1 and Magento 2, the payment data is partially stored in a column named additional_information. The data is stored as a serialized array. Only the CC_CN property is anonymized by the magento1 and magento2 templates.

If this column contains other sensible data in your project, you must anonymize it in your custom config file. For example, in Magento 1:

tables:
    sales_flat_quote_payment:
        converters:
            additional_information:
                parameters:
                    converters:
                        fieldToAnonymize:
                            converter: 'anonymizeText'

    sales_flat_order_payment:
        converters:
            additional_information:
                parameters:
                    converters:
                        fieldToAnonymize:
                            converter: 'anonymizeText'

In Magento 2:

tables:
    quote_payment:
        converters:
            additional_information:
                parameters:
                    converters:
                        fieldToAnonymize:
                            converter: 'anonymizeText'

    sales_order_payment:
        converters:
            additional_information:
                parameters:
                    converters:
                        fieldToAnonymize:
                            converter: 'anonymizeText'

The fields to anonymize will depend on the payment methods that are used in the project.