Skip to content

Latest commit

 

History

History
495 lines (373 loc) · 15.4 KB

01-configuration.md

File metadata and controls

495 lines (373 loc) · 15.4 KB

Configuration

Table of Contents

Templates

Default Templates

The tool is bundled with predefined configuration templates. Each template provides anonymization rules for a specific framework.

Available templates:

To extend a configuration template, you must specify its name, and the version of your application. For example:

extends: 'magento2'
version: '2.4.1'

Custom Templates

The extends parameter can also be used with custom config files:

extends: 'path/to/config.yaml'

The contents of this template will automatically be merged with the configuration file. The path to the file can be an absolute path, or relative to the configuration file.

Extending Multiple Files

It is possible to override multiple config files:

extends:
    - 'path/to/config1.yaml'
    - 'path/to/config2.yaml'

Database Settings

The database information can be specified in the database object:

database:
    host: 'my_host'
    user: 'my_user'
    password: 'my_password'
    name: 'my_db_name'

Only the name parameter is required. Other parameters are optional.

Available parameters:

Parameter Required Default Description
name Y Database name.
user N 'root' Database user.
password N Database password.
host N 'localhost' Database host.
port N Database port.
charset N Charset to use.
unix_socket N Name of the socket to use.
driver N 'pdo_mysql' Database driver. Only pdo_mysql is supported as of now.
driver_options N [] An array of PDO settings.

Dump Settings

Dump settings are all optional.

Example:

dump:
    output: 'my_dump_file-{Y-m-d H:i:s}.sql.gz'
    compress: 'gzip'

Available settings:

Parameter Default Description
output 'php://stdout' Dump output. By default, the dump is outputted to the terminal.

If a relative path is specified, it is relative to the current working directory.

A date format can be specified using curly brackets, e.g. {Y-m-d}.
compress 'none' none, gzip (.gz file extension), bzip2 (.bz2 file extension).
init_commands [] Queries executed after the connection is established.
add_drop_database false MySQL documentation
add_drop_table true MySQL documentation
add_drop_trigger true MySQL documentation
add_locks true MySQL documentation
complete_insert false MySQL documentation
default_character_set 'utf8' utf8 (default, compatible option), utf8mb4 (for full utf8 compliance).
disable_keys true MySQL documentation
extended_insert true MySQL documentation
events false MySQL documentation
hex_blob false MySQL documentation
insert_ignore false MySQL documentation
net_buffer_length 1000000 MySQL documentation
no_autocommit true Option to disable autocommit (faster inserts, no problems with index keys).
no_create_info false MySQL documentation
lock_tables false MySQL documentation
routines false MySQL documentation
single_transaction true MySQL documentation
skip_triggers false MySQL documentation
skip_tz_utc false MySQL documentation
skip_comments false MySQL documentation
skip_dump_date false MySQL documentation
skip_definer false MySQL documentation

Table Whitelist

You can specify a list of tables to include in the dump. If a whitelist is defined, only these tables will be dumped.

tables_whitelist:
    - 'table1'
    - 'table2'

The wildcard character * can be used in table names (e.g. cache_*).

Table Blacklist

You can specify a list of tables to exclude from the dump:

tables_blacklist:
    - 'table1'
    - 'table2'

If a table is both blacklisted and whitelisted, it will not be included in the dump.

The wildcard character * can be used in table names (e.g. cache_*).

Tables Configuration

The configuration of each table must be specified in the tables parameter.

tables:
    table1:
        # ...
    table2:
        # ...

The wildcard character * can be used in table names (e.g. cache_*).

Filtering Values

It is possible to limit the data dumped for each table:

The data is automatically filtered for all tables that depend on the target table (foreign keys).

Available properties:

  • truncate: whether to dump a table without any data (true or false).
  • limit: max number of rows to dump (must be greater than 0, otherwise it is ignored).
  • order_by: same as SQL (e.g. name asc, id desc).
  • filters: a list of filters to apply.

How to define a truncate:

tables:
    my_table:
        truncate: true

How to define a limit:

tables:
    my_table:
        limit: 10000

How to define a sort order:

tables:
    my_table:
        order_by: 'sku, entity_id desc'

How to define a filter:

tables:
    my_table:
        filters:
            - ['id', 'gt', 1000]
            - ['sku', 'isNotNull']
            - ['type', 'in', ['simple', 'configurable']]

Available filter operators:

  • eq (equal to)
  • neq (not equal to)
  • gt (greater than)
  • lt (less than)
  • ge (greater than or equal to)
  • le (less than or equal to)
  • like
  • notLike
  • isNull (no value)
  • isNotNull (no value)
  • in (value must be an array)
  • notIn (value must be an array)

To use an expression, you can prefix the value by expr::

tables:
    my_table:
        filters:
            - ['updated_at', 'gt', 'expr: DATE_SUB(now(), INTERVAL 30 DAY)']
            - ['website_id', 'eq', 'expr: (SELECT website_id FROM store_website WHERE name = "base")']

Data Converters

It is possible to define data converters for any column.

Syntax:

tables:
    my_table:
        converters:
            my_column:
                converter: 'randomizeEmail'
                unique: true

The key is the column name, the value is the converter definition.

List of available properties:

Property Required Default Description
converter Y Converter name. A list of all converters is available here.
condition N '' A PHP expression that must evaluate to true or false. When a condition is set, the value is converted only if the expression evaluates to true.
parameters N {} e.g. min and max for numberBetween. Most converters don't accept any parameter.
unique N false Whether to generate only unique values. May result in a fatal error with converters that can't generate enough unique values.
cache_key N '' The generated value will be used by all converters that use the same cache key.
disabled N false Can be used to disable a converter declared in a parent config file.

How to use parameters:

tables:
    my_table:
        converters:
            my_column:
                converter: 'randomizeEmail'
                parameters:
                    domains: ['example.org']

How to define a condition:

tables:
    my_table:
        converters:
            my_column:
                converter: 'randomizeEmail'
                condition: '{{another_column}} !== null'

The converter is disabled when the condition is evaluated to false. The filter is a PHP expression. Variables must be encapsed by double brackets.

The available variables are the columns of the table. For example, if the table has a id column, the {{id}} variable will be available.

Skipping Data Conversion

It is possible to skip data conversion for an entire table row:

tables:
    my_table:
        skip_conversion_if: 'strpos({{email}}, "@acme.fr") !== false'

The syntax is the same as the converter conditions. If the condition evaluates to true, the table row will be dumped as-is, without any data conversion.

Sharing Converter Results

The cache_key parameter can be used to share values between converters.

For example, to generate the same anonymized email in two tables:

tables:
    customer_entity:
        converters:
            email:
                converter: 'randomizeEmail'
                cache_key: 'customer_email'
                unique: true
tables:
    newsletter_subscriber:
        converters:
            subscriber_email:
                converter: 'randomizeEmail'
                cache_key: 'customer_email'
                unique: true

Notes:

  • If you use the unique parameter, it must be specified in all converters that share the same cache key. If the parameter is missing somewhere, it can result in a infinite loop situation.
  • This feature is not used in the default templates (magento2, ...), because it may require a lot of memory, depending on the size of the tables.

Advanced Configuration

Environment Variables

You can use environment variables with the following syntax:

database:
    host: '%env(DB_HOST)%'
    user: '%env(DB_USER)%'
    password: '%env(DB_PASSWORD)%'
    name: '%env(DB_NAME)%'

You can also set the variable type with the following syntax:

tables:
    cache:
        truncate: '%env(bool:TRUNCATE_CACHE_TABLE)%'

Available types: string (default), bool, int, float, json.

The JSON type can be used to define array values. For example:

tables_blacklist: '%env(json:TABLES_BLACKLIST)%'

Example value of the environment variable: ["table1", "table2", "table3"].

SQL Variables

It is possible to store SQL query results in user-defined variables:

variables:
    firstname_attribute_id: 'select attribute_id from eav_attribute where attribute_code = "firstname" and entity_type_id = 1'
    lastname_attribute_id: 'select attribute_id from eav_attribute where attribute_code = "lastname" and entity_type_id = 1'

It can then be used in query filters and converter conditions.

Using variables in query filters:

tables:
    my_table:
      filters:
        - ['attribute_id', 'eq', 'expr: @firstname_attribute_id']

Using variables in converter conditions:

tables:
    customer_entity_varchar:
        converters:
            converter: 'anonymizeText'
            condition: '{{attribute_id}} == @firstname_attribute_id'

Faker Locale

By default, the locale used in faker formatters is en_US. It can be changed with the following setting:

faker:
    locale: 'de_DE'

Warning: the default phar distribution only includes the "en_US" locale. To use other locales with the phar, you must compile your own phar file that includes the required locales.

Unsetting Values Declared in Config Templates

It is possible to unset values that were declared in a parent config file, by setting them to null.

Warning: setting a value to null is only allowed if it is already defined in a parent config file.

Example - removing the whole config of a table (converters, filters, limit...):

extends: 'magento2'
tables:
    admin_user: ~

Example - removing all converters of a table:

extends: 'magento2'
tables:
    admin_user:
        converters: ~

Example - removing a specific converter:

extends: 'magento2'
tables:
    admin_user:
        converters:
            email: ~

Alternatively, converters can be disabled by setting the disabled parameter to true:

extends: 'magento2'
tables:
    admin_user:
        converters:
            email:
                disabled: true

Version-specific Configuration

The if_version property allows to define configuration that will be read only if the version of your application matches a requirement.

Syntax:

if_version:
    '>=1.0.0 <1.1.0':
        # version-specific config here (e.g. tables)

The application version can be defined with the version parameter, as explained earlier in this documentation.

The version parameter becomes mandatory if the requiresVersion parameter is defined and set to true. The magento2 template uses that feature.

There is little point to use this feature in your custom configuration file(s). It is mainly used to provide default config templates that are compatible with all versions of a framework.