- Templates
- Database Settings
- Dump Settings
- Table Whitelist
- Table Blacklist
- Tables Configuration
- Advanced Configuration
The tool is bundled with predefined configuration templates. Each template provides anonymization rules for a specific framework.
Available templates:
To extend a configuration template, you must specify its name, and the version of your application. For example:
extends: 'magento2'
version: '2.4.1'
The extends
parameter can also be used with custom config files:
extends: 'path/to/config.yaml'
The contents of this template will automatically be merged with the configuration file. The path to the file can be an absolute path, or relative to the configuration file.
It is possible to override multiple config files:
extends:
- 'path/to/config1.yaml'
- 'path/to/config2.yaml'
The database information can be specified in the database
object:
database:
host: 'my_host'
user: 'my_user'
password: 'my_password'
name: 'my_db_name'
Only the name
parameter is required.
Other parameters are optional.
Available parameters:
Parameter | Required | Default | Description |
---|---|---|---|
name | Y | Database name. | |
user | N | 'root' |
Database user. |
password | N | Database password. | |
host | N | 'localhost' |
Database host. |
port | N | Database port. | |
charset | N | Charset to use. | |
unix_socket | N | Name of the socket to use. | |
driver | N | 'pdo_mysql' |
Database driver. Only pdo_mysql is supported as of now. |
driver_options | N | [] |
An array of PDO settings. |
Dump settings are all optional.
Example:
dump:
output: 'my_dump_file-{Y-m-d H:i:s}.sql.gz'
compress: 'gzip'
Available settings:
Parameter | Default | Description |
---|---|---|
output | 'php://stdout' |
Dump output. By default, the dump is outputted to the terminal. If a relative path is specified, it is relative to the current working directory. A date format can be specified using curly brackets, e.g. {Y-m-d} . |
compress | 'none' |
none , gzip (.gz file extension), bzip2 (.bz2 file extension). |
init_commands | [] |
Queries executed after the connection is established. |
add_drop_database | false |
MySQL documentation |
add_drop_table | true |
MySQL documentation |
add_drop_trigger | true |
MySQL documentation |
add_locks | true |
MySQL documentation |
complete_insert | false |
MySQL documentation |
default_character_set | 'utf8' |
utf8 (default, compatible option), utf8mb4 (for full utf8 compliance). |
disable_keys | true |
MySQL documentation |
extended_insert | true |
MySQL documentation |
events | false |
MySQL documentation |
hex_blob | false |
MySQL documentation |
insert_ignore | false |
MySQL documentation |
net_buffer_length | 1000000 |
MySQL documentation |
no_autocommit | true |
Option to disable autocommit (faster inserts, no problems with index keys). |
no_create_info | false |
MySQL documentation |
lock_tables | false |
MySQL documentation |
routines | false |
MySQL documentation |
single_transaction | true |
MySQL documentation |
skip_triggers | false |
MySQL documentation |
skip_tz_utc | false |
MySQL documentation |
skip_comments | false |
MySQL documentation |
skip_dump_date | false |
MySQL documentation |
skip_definer | false |
MySQL documentation |
You can specify a list of tables to include in the dump. If a whitelist is defined, only these tables will be dumped.
tables_whitelist:
- 'table1'
- 'table2'
The wildcard character *
can be used in table names (e.g. cache_*
).
You can specify a list of tables to exclude from the dump:
tables_blacklist:
- 'table1'
- 'table2'
If a table is both blacklisted and whitelisted, it will not be included in the dump.
The wildcard character *
can be used in table names (e.g. cache_*
).
The configuration of each table must be specified in the tables
parameter.
tables:
table1:
# ...
table2:
# ...
The wildcard character *
can be used in table names (e.g. cache_*
).
It is possible to limit the data dumped for each table:
The data is automatically filtered for all tables that depend on the target table (foreign keys).
Available properties:
truncate
: whether to dump a table without any data (true
orfalse
).limit
: max number of rows to dump (must be greater than 0, otherwise it is ignored).order_by
: same as SQL (e.g.name asc, id desc
).filters
: a list of filters to apply.
How to define a truncate:
tables:
my_table:
truncate: true
How to define a limit:
tables:
my_table:
limit: 10000
How to define a sort order:
tables:
my_table:
order_by: 'sku, entity_id desc'
How to define a filter:
tables:
my_table:
filters:
- ['id', 'gt', 1000]
- ['sku', 'isNotNull']
- ['type', 'in', ['simple', 'configurable']]
Available filter operators:
eq
(equal to)neq
(not equal to)gt
(greater than)lt
(less than)ge
(greater than or equal to)le
(less than or equal to)like
notLike
isNull
(no value)isNotNull
(no value)in
(value must be an array)notIn
(value must be an array)
To use an expression, you can prefix the value by expr:
:
tables:
my_table:
filters:
- ['updated_at', 'gt', 'expr: DATE_SUB(now(), INTERVAL 30 DAY)']
- ['website_id', 'eq', 'expr: (SELECT website_id FROM store_website WHERE name = "base")']
It is possible to define data converters for any column.
Syntax:
tables:
my_table:
converters:
my_column:
converter: 'randomizeEmail'
unique: true
The key is the column name, the value is the converter definition.
List of available properties:
Property | Required | Default | Description |
---|---|---|---|
converter | Y | Converter name. A list of all converters is available here. | |
condition | N | '' |
A PHP expression that must evaluate to true or false . When a condition is set, the value is converted only if the expression evaluates to true . |
parameters | N | {} |
e.g. min and max for numberBetween . Most converters don't accept any parameter. |
unique | N | false |
Whether to generate only unique values. May result in a fatal error with converters that can't generate enough unique values. |
cache_key | N | '' |
The generated value will be used by all converters that use the same cache key. |
disabled | N | false |
Can be used to disable a converter declared in a parent config file. |
How to use parameters:
tables:
my_table:
converters:
my_column:
converter: 'randomizeEmail'
parameters:
domains: ['example.org']
How to define a condition:
tables:
my_table:
converters:
my_column:
converter: 'randomizeEmail'
condition: '{{another_column}} !== null'
The converter is disabled when the condition is evaluated to false. The filter is a PHP expression. Variables must be encapsed by double brackets.
The available variables are the columns of the table.
For example, if the table has a id
column, the {{id}}
variable will be available.
It is possible to skip data conversion for an entire table row:
tables:
my_table:
skip_conversion_if: 'strpos({{email}}, "@acme.fr") !== false'
The syntax is the same as the converter conditions. If the condition evaluates to true, the table row will be dumped as-is, without any data conversion.
The cache_key
parameter can be used to share values between converters.
For example, to generate the same anonymized email in two tables:
tables:
customer_entity:
converters:
email:
converter: 'randomizeEmail'
cache_key: 'customer_email'
unique: true
tables:
newsletter_subscriber:
converters:
subscriber_email:
converter: 'randomizeEmail'
cache_key: 'customer_email'
unique: true
Notes:
- If you use the
unique
parameter, it must be specified in all converters that share the same cache key. If the parameter is missing somewhere, it can result in a infinite loop situation. - This feature is not used in the default templates (
magento2
, ...), because it may require a lot of memory, depending on the size of the tables.
You can use environment variables with the following syntax:
database:
host: '%env(DB_HOST)%'
user: '%env(DB_USER)%'
password: '%env(DB_PASSWORD)%'
name: '%env(DB_NAME)%'
You can also set the variable type with the following syntax:
tables:
cache:
truncate: '%env(bool:TRUNCATE_CACHE_TABLE)%'
Available types: string (default), bool, int, float, json.
The JSON type can be used to define array values. For example:
tables_blacklist: '%env(json:TABLES_BLACKLIST)%'
Example value of the environment variable: ["table1", "table2", "table3"]
.
It is possible to store SQL query results in user-defined variables:
variables:
firstname_attribute_id: 'select attribute_id from eav_attribute where attribute_code = "firstname" and entity_type_id = 1'
lastname_attribute_id: 'select attribute_id from eav_attribute where attribute_code = "lastname" and entity_type_id = 1'
It can then be used in query filters and converter conditions.
Using variables in query filters:
tables:
my_table:
filters:
- ['attribute_id', 'eq', 'expr: @firstname_attribute_id']
Using variables in converter conditions:
tables:
customer_entity_varchar:
converters:
converter: 'anonymizeText'
condition: '{{attribute_id}} == @firstname_attribute_id'
By default, the locale used in faker formatters is en_US
.
It can be changed with the following setting:
faker:
locale: 'de_DE'
Warning: the default phar distribution only includes the "en_US" locale. To use other locales with the phar, you must compile your own phar file that includes the required locales.
It is possible to unset values that were declared in a parent config file, by setting them to null
.
Warning: setting a value to null
is only allowed if it is already defined in a parent config file.
Example - removing the whole config of a table (converters, filters, limit...):
extends: 'magento2'
tables:
admin_user: ~
Example - removing all converters of a table:
extends: 'magento2'
tables:
admin_user:
converters: ~
Example - removing a specific converter:
extends: 'magento2'
tables:
admin_user:
converters:
email: ~
Alternatively, converters can be disabled by setting the disabled
parameter to true
:
extends: 'magento2'
tables:
admin_user:
converters:
email:
disabled: true
The if_version
property allows to define configuration that will be read only if the version of your application matches a requirement.
Syntax:
if_version:
'>=1.0.0 <1.1.0':
# version-specific config here (e.g. tables)
The application version can be defined with the version
parameter, as explained earlier in this documentation.
The version
parameter becomes mandatory if the requiresVersion
parameter is defined and set to true
.
The magento2 template uses that feature.
There is little point to use this feature in your custom configuration file(s). It is mainly used to provide default config templates that are compatible with all versions of a framework.