diff --git a/changelog/CHANGELOG/index.html b/changelog/CHANGELOG/index.html index 38dfa992..9ca582bb 100644 --- a/changelog/CHANGELOG/index.html +++ b/changelog/CHANGELOG/index.html @@ -72,7 +72,7 @@
Latest Version: 1.4.0
+Latest Version: 1.4.1
To ensure quality of big data, it is necessary to perform calculations of a large number of metrics and checks on huge datasets, which in turn is a difficult task.
Checkita is a Data Quality Framework that solves this problem by formalizing and simplifying the process connecting diff --git a/ru/changelog/CHANGELOG/index.html b/ru/changelog/CHANGELOG/index.html index 0ca7fa76..58d31f3e 100644 --- a/ru/changelog/CHANGELOG/index.html +++ b/ru/changelog/CHANGELOG/index.html @@ -72,7 +72,7 @@
Актуальная версия: 1.4.0
+Актуальная версия: 1.4.1
diff --git a/search/search_index.json b/search/search_index.json index 7444d663..31947322 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en","ru"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"changelog/CHANGELOG/","title":"Changelog","text":""},{"location":"changelog/CHANGELOG/#140-2024-03-13","title":"1.4.0 (2024-03-13)","text":""},{"location":"changelog/CHANGELOG/#features","title":"Features","text":"Документация на русском языке находится в стадии разработки. Пожалуйста, пользуйтесь документацией на английском.
Thank you for considering contributing to our project! We welcome contributions from everyone. By participating in this project, you agree to abide by our Code of Conduct.
Please take a moment to review our Contribution Guide in order to make the contribution process as smooth as possible.
"},{"location":"contribution/code-of-conduct/","title":"Code of Conduct","text":""},{"location":"contribution/code-of-conduct/#our-pledge","title":"Our Pledge","text":"In the interest of fostering an open and inclusive environment, we as contributors and maintainers pledge to make participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, education, socioeconomic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
"},{"location":"contribution/code-of-conduct/#our-standards","title":"Our Standards","text":"Examples of behavior that contributes to creating a positive environment include:
Examples of unacceptable behavior by participants include:
Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
"},{"location":"contribution/code-of-conduct/#enforcement","title":"Enforcement","text":"Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at GitHub. All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident.
Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
"},{"location":"contribution/code-of-conduct/#attribution","title":"Attribution","text":"This Code of Conduct is adapted from the Contributor Covenant, version 2.0, available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
"},{"location":"contribution/code-of-conduct/#scope","title":"Scope","text":"This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.
"},{"location":"contribution/code-of-conduct/#acknowledgements","title":"Acknowledgements","text":"We thank the open-source community for providing inspiration and examples for creating a welcoming and inclusive Code of Conduct. Your efforts make the tech community a better place for everyone.
"},{"location":"contribution/contribution/","title":"Contribution Guide","text":""},{"location":"contribution/contribution/#types-of-contributions","title":"Types of Contributions","text":"We value all kinds of contributions, including:
All contributions will be reviewed by the maintainers of the project. Feedback or suggestions for improvement may be provided. Once everything is approved, your contribution will be merged.
"},{"location":"contribution/contribution/#code-of-conduct","title":"Code of Conduct","text":"Please read and adhere to our Code of Conduct in all your interactions with the project.
"},{"location":"contribution/contribution/#attribution","title":"Attribution","text":"A huge thanks to all contributors who help make this project better!
If you're unsure about anything, feel free to ask for clarification. We appreciate your efforts to make our project better and look forward to your contributions!
"},{"location":"","title":"Home","text":"Latest Version: 1.4.0
To ensure quality of big data, it is necessary to perform calculations of a large number of metrics and checks on huge datasets, which in turn is a difficult task.
Checkita is a Data Quality Framework that solves this problem by formalizing and simplifying the process connecting and reading data from various sources, describing metrics and checks on data from these sources, as well as sending results and notifications via various channels.
Thus, Checkita allows calculating various metrics and checks on data (both structured, and unstructured). The framework is able to perform distributed computing on data in a \"single pass\", using Spark as a computation core. Hocon configurations are used to describe application configurations and job pipelines. Job results are saved in a dedicated framework database, and can also be sent to users via various channels such as File (to local FS, HDFS, S3), Email, Mattermost and Kafka.
Using Spark as a computation engine allows performing metrics and checks calculations at the level of \"raw\" data, without requiring any SQL abstractions over the data (such as Hive or Impala), which in turn can hide some errors in the data (e.g. bad formatting or schema mismatch).
Summarizing, Checkita is able to do following:
Checkita is designed with focus on integration into ETL pipelines and data catalogues:
Another key feature of Checkita data quality framework is that it can process both static (batch) and streaming data sources. Thus, either a batch or streaming application can be started depending on the type of sources that needs to be checked. Streaming mode is currently in experimental phase and is subjected to changes.
The framework is written in Scala 2.12 and uses Spark 2.4+ as the computation core. The project is configured with a parameterized SBT build that allows building the framework for a specific version of Spark, publish the project to a given repository, and also build Uber-jar, both with and without Spark dependencies.
License
Checkita Data Quality framework is GNU LGPL licensed.
This project is a reimagination of Data Quality Framework developed by Agile Lab, Italy.
"},{"location":"01-application-setup/","title":"General Information","text":"Checkita runs as a Spark Application. Thus, it can be run in the same way as any other Spark application:
Both application spark-submit modes are also supported: client
and cluster
.
The framework was developed primarily for batch data processing and currently supports only this mode of operation. A typical architecture for working with Checkita Data Quality is shown in the diagram below:
Also, the Data Quality Framework can be used for streaming data processing, however, this functionality is currently in experimental state and is subject to change. For more detailed information on running quality checks over streaming sources, please see Data Quality Checks over Streaming Sources chapter.
"},{"location":"01-application-setup/01-ApplicationSettings/","title":"Application Settings","text":"General Checkita Data Quality settings are configured in Hocon file application.conf
which is supplied to the application on the startup. All configurations are set within appConfig
section.
There is only one parameter that is set at the top level and this is applicationName
- name of the Spark application. This parameter is optional and if not set, then Checkita Data Quality
application name is used by default.
The rest of the parameters are defined in the subsections that are described below.
"},{"location":"01-application-setup/01-ApplicationSettings/#datetime-settings","title":"DateTime Settings","text":"DateTime configurations are set in the dateTimeOptions
section. Please, see Working with Date and Time section for more details on working with date and time in Checkita Framework.
DateTime settings include following:
timeZone
- Time zone in which string representation of reference date and execution date are parsed and rendered. Optional, default is \"UTC\"
.referenceDateFormat
- datetime format used to parse and render reference date. Optional, default is \"yyyy-MM-dd'T'HH:mm:ss.SSS\"
.executionDateFormat
- datetime format used to parse and render execution date. Optional, default is \"yyyy-MM-dd'T'HH:mm:ss.SSS\"
If dateTimeOptions
section is missing then default values are used for all parameters above.
These settings are only applicable to streaming applications and define various aspects of running data quality checks for streaming sources. Please, see Data Quality Checks over Streaming Sources section for more details on runnig data quality checks over streaming sources.
trigger
- Trigger interval: defines time interval for which micro-batches are collected. Optional, default is 10s
.window
- Window interval: defines tabbing window size used to accumulate metrics. All metrics results and checks are evaluated per each window once it finalised. Optional, default is 10m
.watermark
- Watermark level: defines time interval after which late records are no longer processed. Optional, default is 5m
.allowEmptyWindows
- Boolean flag indicating whether empty windows are allowed. Thus, in situation when window is below watermark and for some of the processed streams there are no results then all related checks will be skipped if this flag is set to true
. Otherwise, checks will be processed and will return error status with ... metric results were not found ...
type of message. Optional, default is false
.IMPORTANT All intervals must be defined as a duration string which should conform to Scala Duration format.
"},{"location":"01-application-setup/01-ApplicationSettings/#enablers","title":"Enablers","text":"Section enablers
of application configuration file defines various boolean switchers is single-value parameters that controls various aspects of data quality job execution:
allowSqlQueries
- Enables usage arbitrary SQL queries in data quality job configuration. Optional, default is false
allowNotifications
- Enables notifications to be sent from DQ application. Optional, default is false
aggregatedKafkaOutput
- Enables sending aggregates messages for Kafka Targets (one per each target type). By default, kafka messages are sent per each result entity. Optional, default is false
enableCaseSensitivity
- Enable columns case sensitivity. Controls column names comparison and lookup. Optional, default is false
errorDumpSize
- Maximum number of errors to be collected per single metric. Framework is able to collect source data rows where metric evaluation yielded some errors. But in order to prevent OOM the number of collected errors have to be limited to a reasonable value. Thus, maximum allowable number of errors per metric is 10000
. It is possible to lower this number by setting this parameter. Optional, default is 10000
outputRepartition
- Sets the number of partitions when writing outputs. By default, writes single file. Optional, default is 1
If enablers
section is missing then default values are used for all parameters above.
Parameters for connecting to Data Quality results storage are defined in storage
section of application configuration.
For more information on results storage refer to Data Quality Results Storage chapter of the documentation.
Thus, connection to storage is configured using following parameters:
dbType
- Type of database used to store Data Quality results. Required.url
- Database connection URL (without protocol identifiers). Required.username
- Username to connect to database with (if required). Optional.password
- Password to connect to database with (if required). Optional.schema
- Schema where data quality tables are located (if required). Optional.saveErrorsToStorage
- Enables metric errors to be stored in storage database. Optional, default is false
.IMPORTANT If storage
section is missing then application will run without usage of results storage:
In addition, be mindful when storing metric errors to storage database. Depending on errorDumpSize
settings, the number of collected errors could be quite large. This will load to overloading DQ storage as well as increase database write operations execution time. Another concern is related to the fact that metric errors contain data excerpts from sources being checked. These excerpts might contain some sensitive information that is rather not to be stored in DQ storage database. Alternatively, these excerpts can be encrypted before storing. See Encryption configuration for more details.
In order to send notification via email it is necessary to configure connection to SMTP server which should be defined in email
section of application configuration with following parameters:
host
- SMTP server host. Required.port
- SMTP server port. Required.address
- Email address to sent notification from. Required.name
- Name of the sender. Required.sslOnConnect
- Boolean parameter indicating whether to use SSL on connect. Optional, default is false
.tlsEnabled
- Boolean parameter indicating whether to enable TLS. Optional, default is false
.username
- Username for connection to SMTP server (if required). Optional.password
- Password for connection to SMTP server (if required). Optional.If email
section is missing then email notifications cannot be sent. If ones were configured in job configuration, then exception would be thrown at runtime.
In order to send notification to Mattermost it is necessary to configure connection to Mattermost API which should be defined in mattermost
section of application configuration with following parameters:
host
- Mattermost API host.token
- Mattermost API token (using Bot accounts for notifications is preferable).If mattermost
section is missing then corresponding notifications cannot be sent. If ones were configured in job configuration, then exception would be thrown at runtime.
It is also possible to provide list of default Spark configuration parameters used across multiple jobs. These parameters should be provided as defaultSparkOptions
list where each parameter is a string in format: spark.param.name=spark.param.value
.
When storage
section is defined, it is also recommended to use encryption
section in order to protect sensitive information in job config. This should be done by defining the parameters within the application configuration file:
secret
- Secret string used to encrypt/decrypt sensitive fields. This string should contain at least 32 characters. Required. keyFields
- List of key fields used to identify fields that requires encryption/decryption. Optional, default is [password, secret]
.encryptErrorData
- Boolean flag indicating whether it is necessary tp encrypt data excerpts within collected metric errors. Optional, default is false
If encryption
section is missing then any sensitive information will not be encrypted.
IMPORTANT Both keys of job configuration and data excerpts that metric errors contain might contain some sensitive information. Storing raw sensitive information in DQ storage database might not satisfy security requirements. Therefore, DQ framework offers functionality to encrypt sensitive data with AES256 encryption algorithm. As AES25 is a symmetric algorithm then encrypted data can be decrypted with use secret key if needed.
"},{"location":"01-application-setup/01-ApplicationSettings/#example-of-application-configuration-file","title":"Example of Application Configuration File","text":"Hocon configuration format supports variable substitution and Checkita Data Quality framework has a mechanism to feed configuration files with extra variables at runtime. For more information, see Usage of Environment Variables and Extra Variables chapter of the documentation.
appConfig: {\n\n applicationName: \"Custom Data Quality Application Name\"\n\n dateTimeOptions: {\n timeZone: \"GMT+3\"\n referenceDateFormat: \"yyyy-MM-dd\"\n executionDateFormat: \"yyyy-MM-dd-HH-mm-ss\"\n }\n\n enablers: {\n allowSqlQueries: false\n allowNotifications: true\n aggregatedKafkaOutput: true\n }\n\n defaultSparkOptions: [\n \"spark.sql.orc.enabled=true\"\n \"spark.sql.parquet.compression.codec=snappy\"\n \"spark.sql.autoBroadcastJoinThreshold=-1\"\n ]\n\n storage: {\n dbType: \"postgres\"\n url: \"localhost:5432/public\"\n username: \"postgres\"\n password: \"postgres\"\n schema: \"dqdb\"\n saveErrorsToStorage: true\n }\n\n email: {\n host: \"smtp.some-company.domain\"\n port: \"25\"\n username: \"emailUser\"\n password: \"emailPassword\"\n address: \"some.service@some-company.domain\"\n name: \"Data Quality Service\"\n sslOnConnect: true\n }\n\n mattermost: {\n host: \"https://some-team.mattermost.com\"\n token: ${dqMattermostToken}\n }\n\n encryption: {\n secret: \"secretmustbeatleastthirtytwocharacters\"\n keyFields: [\"password\", \"username\", \"url\"]\n encryptErrorData: true\n }\n}\n
"},{"location":"01-application-setup/02-ApplicationSubmit/","title":"Submitting Data Quality Application","text":"Since Checkita framework is based on Spark, it runs as an ordinary Spark application using spark-submit
command. And as any Spark application, Checkita applications can be run both locally and on a cluster (in client
or cluster
mode).
However, Checkita applications require some command line arguments to be passed on startup. These are:
-a
- Required. Path to HOCON file with application settings: application.conf
. Note, that name of the file may vary, but usually aforementioned name is used.-j
- Required. List of paths to job configuration files. Paths must be separated by commas. Hocon format supports configuration merging, therefore, it is possible to define different parts of job configuration in separate files and reuse some common configuration sections.-d
- Optional. Datetime for which the Data Quality job is being run. Date string must conform to format specified in referenceDateFormat
parameter of the application settings. If date is not provided on startup, then it will be set to application start date. This parameter is ignored when running streaming application.-l
- Optional. Flag indicating that application should be run in local mode.-s
- Optional. Flag indicating that application will be run using Shared Spark Context. In this case application will get existing context instead of creating a new one. It is also quite important not to stop it upon job completion.-m
- Optional. Flag indicating that storage database migration must be performed prior results saving.-e
- Optional. Extra variables to be added to configuration files during prior parsing. These variables can be used in configuration files, e.g. to pass secrets. Variables are provided in key-value format: \"k1=v1,k2=v2,k3=v3,...\"\"
.-v
- Optional. Application log verbosity. By default, log level is set to INFO
.There are two available applications to start:
ru.raiffeisen.checkita.apps.batch.DataQualityBatchApp
ru.raiffeisen.checkita.apps.stream.DataQualityStreamApp
The following is an example of running an application in YARN in cluster
mode. Framework storage database connection parameters are specified in application.conf
and secrets may be passed either via environment variables or via extra variables argument. For more details see Usage of Environment Variables and Extra Variables.
export DQ_APPLICATION=\"<local or remote (HDFS, S3) path to application jar>\"\nexport DQ_DEPENDENCIES=\"<local or remote (HDFS, S3) path to uber-jar with framework dependencies>\"\nexport DQ_APP_CONFIG=\"<local or remote (HDFS, S3) path to application configuration file>\"\nexport DQ_JOB_CONFIGS=\"<local or remote (HDFS, S3) paths to job configuration files separated by commas>\"\n\n# As configuration files are uploaded to driver and executors they will be located in working directories.\n# Therefore, in application arguments it is required to list just their file names:\nexport DQ_APP_CONFIG_FILE=$(basename $DQ_APP_CONFIG)\nexport DQ_JOB_CONFIG_FILES=\"<job configuration files separated by commas (only file names)>\"\nexport REFERENCE_DATE=\"2023-08-01\"\n\n# application entry point (executable class): ru.raiffeisen.checkita.apps.batch.DataQualityBatchApp\n# --name spark-submit argument has a higher priority over application name set in `application.conf`\n\nspark-submit\\\n --class ru.raiffeisen.checkita.apps.batch.DataQualityBatchApp \\\n --name \"Checkita Data Quality\" \\\n --master yarn \\\n --deploy-mode cluster \\\n --num-executors 1 \\\n --executor-memory 2g \\\n --executor-cores 4 \\\n --driver-memory 2g \\\n --jars $DQ_DEPENDENCIES \\\n --files \"$DQ_APP_CONFIG,$DQ_DQ_JOB_CONFIGS\" \\\n --conf \"spark.executor.memoryOverhead=2g\" \\\n --conf \"spark.driver.memoryOverhead=2g\" \\\n --conf \"spark.driver.maxResultSize=4g\" \\\n $DQ_APPLICATION \\\n -a $DQ_APP_CONFIG_FILE \\\n -j $DQ_JOB_CONFIG_FILES \\\n -d $REFERENCE_DATE \\\n -e \"storage_db_user=some_db_user,storage_db_password=some_db_password\"\n
"},{"location":"01-application-setup/03-ResultsStorage/","title":"Data Quality Results Storage","text":"In order to use all features of the framework, it is required to set up a results storage. Checkita can use various RDBMS as a results storage. Also, Hive can be used as a results storage and even a simple file storage is supported.
The full list of various storage types is following:
PostgreSQL
(v.9.3 and higher) - recommended database to be used as resutls storage.Oracle
MySQL
Microsoft SQL Server
SQLite
H2
Hive
File
(directory in local file system or remote one (HDFS, S3))Checkita framework support results storage schema evolution. Flyway is run under the hood to support schema migrations. Therefore, if one of the supported RDBMS is chosen for results storage then it is possible to set it up during the first run of the Data Quality job providing -m
application argument on startup. For more details on how to run Data Quality applications refer to Submitting Data Quality Application chapter.
IMPORTANT: Flyway migrations usually run either in empty database/schema or in one that was initiated with Flyway. In Checkita framework it is also possible to run migration in non-empty database/schema. In this case it is up to user to ensure that there are no conflicting table names in database/schema.
If File
type of storage is used then it is only required to provide a path to a directory/bucket, where results will be stored. Results are stored as parquet files with the same schema as for RDMS storage. No schema evolution mechanisms are provided for File
type of storage. Therefore, if results schemas would evolve later, it will be up to user to update existing results to a new structure.
IMPORTANT: There is no partitioning used for storing results as parquet files. Every job will read entire results history and overwrite it adding new ones. Therefore, using File
type of storage is not recommended for production use.
For Hive
type of storage the schema evolution mechanisms are also not available. Therefore, it is up to user to create corresponding hive tables. DDL scripts from Hive Storage Setup Scripts chapter below can be used for that.
IMPORTANT: Results hive table must be partitioned by job_id
. Job ID is chosen as partition column to support faster results fetching during computation of trend checks (used for anomaly detection in data). Hive
type of results storage works faster that File
one, since only partition for current job_id
is read and overwritten. Nevertheless, this type of storage is also not recommended for use in production where large number of jobs will be run.
There are for types of result are written in storage:
Schemas for all results types are given below.
Primary keys denotes how we keep track if unique records: generally results for the same Data Quality job that is run for the same reference date are overwritten. History of various attempts of the same job for the same reference date is not stored. It is done in order trend checks work correctly. As these checks read historical results from Data Quality storage, it is required that there will be only one set of results per Data Quality job and given reference date.
"},{"location":"01-application-setup/03-ResultsStorage/#regular-metrics-results-schema","title":"Regular Metrics Results Schema","text":"(job_id, metric_id, reference_date)
;source_id
& column_names
contain string representation of lists in format '[val1,val2,val3]'
.params
is a JSON string.(job_id, metric_id, reference_date)
;source_id
contains string representation of lists in format '[val1,val2,val3]'
.(job_id, error_hash, reference_date)
;source_id
is a JSON string;source_key_fields
is a JSON string;metric_columns
is a JSON string;row_data
is a JSON string.errorHash
is a MD5 hash string computed from values of columns metric_id
, status
, message
and row_data
NOTE Error hash is computed with use of raw value of row_data
field even if it is encrypted later.
(job_id, check_id, reference_date)
;source_id
contains string representation of lists in format '[val1,val2,val3]'
.(job_id, check_id, reference_date)
;source_id
contains string representation of lists in format '[val1,val2,val3]'
.(job_id, reference_date)
;version_info
is a JSON string;config
is a JSON string.Below is a HiveQL script that can be used to set up Hive results storage:
-- REPLACE <schema_name> and <schema_dir> with actual name and path:\nset hivevar:schema_name=<schema_name>;\nset hivevar:schema_dir=<schema_path>;\n\nCREATE SCHEMA IF NOT EXISTS ${schema_name};\n\nDROP TABLE IF EXISTS ${schema_name}.results_metric_regular;\nCREATE EXTERNAL TABLE ${schema_name}.results_metric_regular\n(\n job_id STRING COMMENT '',\n metric_id STRING COMMENT '',\n metric_name STRING COMMENT '',\n description STRING COMMENT '',\n metadata STRING COMMENT '',\n source_id STRING COMMENT '',\n column_names STRING COMMENT '',\n params STRING COMMENT '',\n result DOUBLE COMMENT '',\n additional_result STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT ''\n)\nCOMMENT 'Data Quality Regular Metrics Results'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/results_metric_regular';\n\nDROP TABLE IF EXISTS ${schema_name}.results_metric_composed;\nCREATE EXTERNAL TABLE ${schema_name}.results_metric_composed\n(\n job_id STRING COMMENT '',\n metric_id STRING COMMENT '',\n metric_name STRING COMMENT '',\n description STRING COMMENT '',\n metadata STRING COMMENT '',\n source_id STRING COMMENT '',\n formula STRING COMMENT '',\n result DOUBLE COMMENT '',\n additional_result STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT ''\n)\nCOMMENT 'Data Quality Composed Metrics Results'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/results_metric_composed';\n\nDROP TABLE IF EXISTS ${schema_name}.results_metric_error;\nCREATE EXTERNAL TABLE ${schema_name}.results_metric_error\n(\n job_id STRING COMMENT '',\n metric_id STRING COMMENT '',\n source_id STRING COMMENT '',\n source_key_fields STRING COMMENT '',\n metric_columns STRING COMMENT '',\n status STRING COMMENT '',\n message STRING COMMENT '',\n row_data STRING COMMENT '',\n error_hash STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT '',\n)\nCOMMENT 'Data Quality Metrics Error Results'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/results_metric_error';\n\nDROP TABLE IF EXISTS ${schema_name}.results_check_load;\nCREATE EXTERNAL TABLE ${schema_name}.results_check_load\n(\n job_id STRING COMMENT '',\n check_id STRING COMMENT '',\n check_name STRING COMMENT '',\n description STRING COMMENT '',\n metadata STRING COMMENT '',\n source_id STRING COMMENT '',\n expected STRING COMMENT '',\n status STRING COMMENT '',\n message STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT ''\n)\nCOMMENT 'Data Quality Load Checks Results'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/results_check_load';\n\nDROP TABLE IF EXISTS ${schema_name}.results_check;\nCREATE EXTERNAL TABLE ${schema_name}.results_check\n(\n job_id STRING COMMENT '',\n check_id STRING COMMENT '',\n check_name STRING COMMENT '',\n description STRING COMMENT '',\n metadata STRING COMMENT '',\n source_id STRING COMMENT '',\n base_metric STRING COMMENT '',\n compared_metric STRING COMMENT '',\n compared_threshold DOUBLE COMMENT '',\n lower_bound DOUBLE COMMENT '',\n upper_bound DOUBLE COMMENT '',\n status STRING COMMENT '',\n message STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT ''\n)\nCOMMENT 'Data Quality Checks Results'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/results_check';\n\nDROP TABLE IF EXISTS ${schema_name}.job_state;\nCREATE EXTERNAL TABLE ${schema_name}.job_state\n(\n job_id STRING COMMENT '',\n config STRING COMMENT '',\n version_info STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT ''\n)\nCOMMENT 'Data Quality Job State'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/job_state';\n
"},{"location":"02-general-concepts/","title":"General Concepts","text":"In this section various aspects of working with Checkita Data Quality framework are explained.
"},{"location":"02-general-concepts/01-WorkingWithDateTime/","title":"Working with Date and Time","text":"There are two type of datetime instances used in order to identify various Data Quality job runs. These are:
referenceDate
- identifies date for which the job is run. This datetime usually indicates for which period data is read and checked.executionDate
- stores actual application start datetime and used to indicate when exactly data quality job is run.Typical case is when we run some ETL pipeline after \"closure of business\", e.g. at midnight. Thus, the referenceDate
will refer to a previous day, while executionDate
will have value of actual start of data quality job. It is likely that we would like to represent these values differently. Thus, in application configuration we can configure different formats for referenceDate
and executionDate
As referenceDate
can point to a date in the past, then it is allowed to explicitly provide its values on application startup. If value of referenceDate
is not provided, then it is set to datetime of actual start of data quality job. See Submitting Data Quality Application chapter for more information on application startup arguments.
Both of these datetime instances are widely used across framework. Thus, whenever string representation of them is required, it is obtained using datetime parameters set in the application configuration file.
It also should be noted, that datetime rendering is performed with respect to timezone in which the application is running. Timezone is also set in application configuration file. The UTC
time zone is used by default.
The last but not least: we avoid using datetime string representation when storing results into storage database. Both referenceDate
and executionDate
are converted to timestamp at UTC
timezone, instead. This ensures stable results querying from storage independent on datetime configuration parameters. See Data Quality Results Storage chapter for more information on results storage.
IMPORTANT: Actual string representation of referenceDate
and exectionDate
are always added to configuration files as extra variables. For more details on extra variables usage in configuration files, see Usage of Environment Variables and Extra Variables chapter.
Hocon configuration format supports variable substitution. This mechanism allows more flexible management of both application configuration and job configuration.
Thus, configurations files are feed with extra variables that are read from system and JVM environment and can also be explicitly defined at application startup.
For more information on how to explicitly define extra variables on startup, see Submitting Data Quality Application chapter of the documentation.
In order to use system or JVM environment variables their names must match following regex expression: ^(?i)(DQ)[a-z0-9_-]+$
, e.g. DQ_STORAGE_PASSOWRD
or dqMattermostToken
. All environment variables that match this regex expression will be retrieved and available for substitution in both application and job configuration files.
Typical use case for variable substitution is to provide secrets for connection to external systems. It is not a good idea to store such information in configuration files and, therefore, there must be a mechanism to provide it at runtime.
IMPORTANT: Variables are added to configuration files at runtime and are not stored in any form.
"},{"location":"02-general-concepts/03-StatusModel/","title":"Status Model used in Results","text":"Unified status model is used for results that Checkita framework produces. Thus, all metrics and check results have common status indication that is following:
Success
- Evaluation of metric or check completed without any errors and metric or check condition is met.Failure
- Evaluation of metric or check yielded results that do not meet configured condition, e.g.:Error
- Caught runtime error during metric or check evaluation. Runtime error message is caught as well.Result status is always accompanied by message, that describes this status. What not common between metrics and checks is how statuses are communicated with user:
Success
then metric error is collected for this particular row of data. Then, metric error reports can be requested as Error Collection Targets. For more information on metric error collection, see Metric Error Collection chapter.Metric calculation involves reading data row by row and incrementing metric value for each row. During increment step there could be something wrong: either due to problems with data or due to some unexpected runtime errors. In addition, some metrics have logical condition that needs to be met in order to increment the metric value. Failing to satisfy this condition is also considered as failure.
Thus, in the situations, described above, there will be error collection mechanism triggered and following error data or failure data collected:
Failure
or Error
) and message.Since the processed source can be extremely large and, subsequently, can yield large amount of metric errors then out-of-memory errors are likely to happen. In order to prevent that, the number of errors collected per each metric is limited. Thus, maximum number of errors collected per metric cannot be more than 10000
. This number can be additionally limited in the application settings by setting errorDumpSize
parameter to a lower number. See Enablers chapter for more details.
Collected metric errors could be used to identify and debug problems in the data. In order to save or send metric error reports, Error Collection Targets can be configured in targets
section of job configuration. Note that error collection reports will contain excerpts from data and, therefore, should be communicated with caution. For the same reason they are never saved in Data Quality storage.
IMPORTANT Functionality of performing data quality checks over streaming sources is currently in experimental state and is subjected to changes.
As it has already been stated Checkita Data Quality framework has ability to calculate metrics and perform quality checks over streaming data sources. As Spark is used as a computation engine, then Spark Structured Streaming API is used to run metric calculations over streaming data sources.
The core idea of running data quality job in streaming mode is to retain the ability to process multiple data sources at the same time. As metrics calculation is a stateful operation then all streaming sources are processed per tabbing windows. In order to process multiple sources simultaneously, their windows must be synchronized: (1) be of the same size and (2) starting at the same time. Therefore, window size is set at the application level and is used for all processed streaming source.
As streaming sources are processed per each window, then it is crucial to provide time value used to assign record to a particular window. Following options are supported:
Processing time
- Spark builds time value for each record using current_timestamp
function.Event time
- Mostly applicable to kafka topics: time value is obtained from timestamp
column which correspond to message creation time (a.k.a. event time).Custom time
- Uses user-defined column of timestamp type that is used to provide time value for window assignment.Another thing to care about is how to finalize windows state. In other words, it is required to establish rules on when we can consider window state is final and assume that no new records will arrive to this window. Common approach to resolve this problem in streaming processing is to use \"watermarks\". Watermark holds a time value which sets a level to accept new records for processing. If record's time is below the watermark level, then it is considered to be \"late\" and is not processed. Watermark is defined as maximum observed record time minus predefined offset. For more details see Spark documentation: Handling Late Data and Watermarking. For purpose of synchronous processing of multiple streaming sources the watermark offset is the same for all sources and is set at the application level.
Finally, it should be noted that Spark Structured Steaming engine processes streaming sources in micro-batches. Thus, records are collected for some short-termed interval and processed as a static dataframe (micro-batch). Spark allows us to control time interval for which micro-batches are collected by setting trigger
interval. This interval must also be the same for all streams and is set at application level. Adjusting trigger interval allows us to control size of micro-batches and thus to control executors load.
Thus, for more information on streaming configuration settings, please see Streaming Settings chapter. Summarizing, data quality streaming job processing routing consists of following stages:
forEachBatch
sink.For each micro-batch (evaluated once per trigger interval) process data:
register metric error accumulator;
update processor buffer state, which contains state of metric calculators for all windows as well as collected metric errors (also per each window). In addition, processor buffer tracks current watermark levels per each processed streaming source.
Window processor checks processor buffer (also once per trigger interval) for windows that are completely below the watermark level. IMPORTANT In order to support synchronised processing of multiple streaming sources, the minimum watermark level is used (computed from current watermark levels of all the processed sources). This ensures that window is finalised for all processed sources.
Once finalised window is obtained, then for this window all data quality routines are performed:
metric results are retrieved from calculators;
processor buffer is cleared: state for processed window is removed.
Streaming queries and window processor run until application is stopped (sigterm
signal received) or error occurs.
Important note on results saving: since set of results is generated per each processed window than for each set of results reference datetime and execution datetime is set to a corresponding window start time. For more details on working with datetime in Checkita framework, please see Working with Date and Time chapter.
TIP Since data quality checks are performed for each window, then windows size should rather be large, in order to produce results at such time interval which allows reviewing any occurred data quality issued and take some measures to resolve them. Thus, if your engineering team has a \"reaction time\" of 1 hour then it is quite unreasonable to perform quality checks over streaming source with 10-minutes window.
"},{"location":"03-job-configuration/","title":"Job Configuration","text":"Data Quality job in Checkita is a sequence of tasks that need to be performed in order to check quality of data. These tasks may include following:
All the aforementioned tasks are configured in one or multiple Hocon configuration files. All job configurations are set within jobConfig
section of the configuration files.
There is only one parameter that is set at the top level and this is jobId
- ID of the job to be run. This parameter is mandatory for any job configuration. Thus, jobId
usually unites calculation of various metrics and checks that are performed over the sources within single schema, data-mart or other logical formation of data sources.
The rest of the parameters are defined in the subsections that are described in a separate chapters of this documentation:
Example of fully filled job configuration can be found in Job Configuration Example chapter of this documentation.
"},{"location":"03-job-configuration/01-Connections/","title":"Connections Configuration","text":"Checkita framework allows creation of data sources based on data from external systems such as RDBMS or message queues like Kafka. In order to read data from external systems it is required to establish a connection in a first place.
Thus, connections are described in connections
section of job configuration. Currently, connection to following systems are supported:
All connections are defined with following common parameters:
id
- Connection ID that uniquely identifies its configuration;description
- Optional connection description;parameters
- Optional list of additional Spark parameters that can be specified to provide some extra configuration required by Spark to read data from a particular system.metadata
- Optional list of arbitrary user-defined metadata parameters.Example of connections
section of job configuration is shown in Connections Configuration Example below.
Configuring connection to SQLite database is quite easy. In addition to common parameters it is required to supply only a path to database file:
id
- Required. Connection ID;description
- Optional. Connection description;url
- Required. Path to SQLite database file.parameters
- Optional. List of Spark parameters if required where each parameter is a string in format: spark.param.name=spark.param.value
.metadata
- Optional. List of user-defined metadata parameters specific to this connection where each parameter is a string in format: param.name=param.value
.Configuration to PostgreSQL can be set up using following parameters:
id
- Required. Connection ID;description
- Optional. Connection description;url
- Required. Connection URL. Should contain host, port and name of database. In addition, extra parameters can be supplied in connection URL if required. Connection protocol must not be specified.username
- Optional. Username used to connect to PostgreSQL database if required.password
- Optional. Password used to connect to PostgreSQL database if required.parameters
- Optional. List of Spark parameters if required where each parameter is a string in format: spark.param.name=spark.param.value
.metadata
- Optional. List of user-defined metadata parameters specific to this connection where each parameter is a string in format: param.name=param.value
.Configuration to Oracle can be set up in the same way as to PostgreSQL, using following parameters:
id
- Required. Connection ID;description
- Optional. Connection description;url
- Required. Connection URL. Should contain host, port and name of database. In addition, extra parameters can be supplied in connection URL if required. Connection protocol must not be specified.username
- Optional. Username used to connect to Oracle database if required.password
- Optional. Password used to connect to Oracle database if required.parameters
- Optional. List of Spark parameters if required where each parameter is a string in format: spark.param.name=spark.param.value
.metadata
- Optional. List of user-defined metadata parameters specific to this connection where each parameter is a string in format: param.name=param.value
.Configuration to MySQL can be set up in the same way as to PostgreSQL and Oracle, using following parameters:
id
- Required. Connection ID;description
- Optional. Connection description;url
- Required. Connection URL. Should contain host, port and name of database. In addition, extra parameters can be supplied in connection URL if required. Connection protocol must not be specified.username
- Optional. Username used to connect to MySQL database if required.password
- Optional. Password used to connect to MySQL database if required.parameters
- Optional. List of Spark parameters if required where each parameter is a string in format: spark.param.name=spark.param.value
.metadata
- Optional. List of user-defined metadata parameters specific to this connection where each parameter is a string in format: param.name=param.value
.Configuration to MS SQL can be set up similarly to the 3 previous, using following parameters:
id
- Required. Connection ID;description
- Optional. Connection description;url
- Required. Connection URL. Should contain host, port and name of database. In addition, extra parameters can be supplied in connection URL if required. Connection protocol must not be specified.username
- Optional. Username used to connect to MS SQL database if required.password
- Optional. Password used to connect to MS SQL database if required.parameters
- Optional. List of Spark parameters if required where each parameter is a string in format: spark.param.name=spark.param.value
.metadata
- Optional. List of user-defined metadata parameters specific to this connection where each parameter is a string in format: param.name=param.value
.Configuring connection to H2 database has similarly to SQLite. It is required supplying only two parameters:
id
- Required. Connection ID;description
- Optional. Connection description;url
- Required. Connection URL. Should contain host, port and name of database. In addition, extra parameters can be supplied in connection URL if required. Connection protocol must not be specified.parameters
- Optional. List of Spark parameters if required where each parameter is a string in format: spark.param.name=spark.param.value
.metadata
- Optional. List of user-defined metadata parameters specific to this connection where each parameter is a string in format: param.name=param.value
.Configuration to ClickHouse can be set up in the same way as to MS SQL, using following parameters:
id
- Required. Connection ID;description
- Optional. Connection description;url
- Required. Connection URL. Should contain host, port and name of database. In addition, extra parameters can be supplied in connection URL if required. Connection protocol must not be specified.username
- Optional. Username used to connect to ClickHouse database if required.password
- Optional. Password used to connect to ClickHouse database if required.parameters
- Optional. List of Spark parameters if required where each parameter is a string in format: spark.param.name=spark.param.value
.metadata
- Optional. List of user-defined metadata parameters specific to this connection where each parameter is a string in format: param.name=param.value
.In order to connect to set up connection to Kafka brokers, it is required to supply following parameters:
id
- Required. Connection ID;description
- Optional. Connection description;servers
- Required. List of broker servers to connect to.parameters
- Optional. List of Spark parameters if required where each parameter is a string in format: spark.param.name=spark.param.value
. Usually, Kafka authorisation settings are provided by means of spark parameters.metadata
- Optional. List of user-defined metadata parameters specific to this connection where each parameter is a string in format: param.name=param.value
.If connection to Kafka cluster requires JAAS configuration file, then it should be provided via Java environment variables. Note, that these variables must be declared prior JVM starts, therefore, they must be set in spark-submit
command as follows:
cluster
mode: --deploy-mode cluster \\\n--conf 'spark.driver.extraJavaOptions=\"-Djava.security.auth.login.config=./jaas.conf\"' \\\n--conf 'spark.executor.extraJavaOptions=\"-Djava.security.auth.login.config=./jaas.conf\"' \\\n--files /path/to/your/jaas.conf,<other files required for DQ>\n
client
mode the driver JVM starts on client prior Spark configuration is read, therefore, Java environment variables for driver must be set in advance using --driver-java-options
argument: --deploy-mode client \\\n--driver-java-options \"-Djava.security.auth.login.config=.jaas.conf\" \\\n--conf 'spark.executor.extraJavaOptions=\"-Djava.security.auth.login.config=./jaas.conf\"' \\\n--files file.keytab,jaas.conf,<other files required for DQ>\n
Configuring connection to Greenplum, you must specify the following parameters:
id
- Required. Connection ID;description
- Optional. Connection description;url
- Required. Connection URL. Should contain host, port and name of database. In addition, extra parameters can be supplied in connection URL if required. Connection protocol must not be specified.username
- Optional. Username used to connect to Greenplum database if required.password
- Optional. Password used to connect to Greenplum database if required.schema
- Optional. schema to lookup tables from. If omitted, default schema is used.parameters
- Optional. List of Spark parameters if required where each parameter is a string in format: spark.param.name=spark.param.value
.metadata
- Optional. List of user-defined metadata parameters specific to this connection where each parameter is a string in format: param.name=param.value
.Pivotal connector is not published in public repositories such as Maven Central. Therefore, this dependency is unmanaged and should be manually added to Spark application during submit (using spark.jars configuration parameter). Connector jar-file can be downloaded from official Pivotal releases.
"},{"location":"03-job-configuration/01-Connections/#connections-configuration-example","title":"Connections Configuration Example","text":"As it is shown in the example below, connections of the same type are grouped within subsections named after the type of connection. These subsections should contain a list of connection configurations of the corresponding type.
jobConfig: {\n connections: {\n postgres: [\n {\n id: \"postgre_db1\",\n description: \"Connection to production instance of DB\"\n url: \"postgre1.db.com:5432/public\", \n username: \"dq-user\", \n password: \"dq-password\",\n metadata: [\n \"db.owner=some.user@some.domain\",\n \"environment=prod\"\n ]\n }\n {\n id: \"postgre_db2\",\n description: \"Connection to test instance of DB\"\n url: \"postgre2.db.com:5432/public\",\n username: \"dq-user\",\n password: \"dq-password\",\n schema: \"dataquality\",\n metadata: [\n \"db.owner=some.user@some.domain\",\n \"environment=test\"\n ]\n }\n ]\n oracle: [\n {id: \"oracle_db1\", url: \"oracle.db.com:1521/public\", username: \"db-user\", password: \"dq-password\"}\n ]\n sqlite: [\n {id: \"sqlite_db\", url: \"some/path/to/db.sqlite\"}\n ],\n mysql: [\n {id: \"mysql_db1\", url: \"mysql.db.com:8306/public\", username: \"user\", password: \"pass\"}\n ],\n mssql: [\n {id: \"mssql_db1\", url: \"mssql.db.com:8433\", username: \"user\", password: \"pass\"}\n ],\n h2: [\n {id: \"h2_db1\", url: \"h2.db.com:9092/default\", username: \"user\", password: \"pass\"}\n ],\n clickhouse: [\n {id: \"clickhouse_db1\", url: \"clickhouse.db.com:8123\", username: \"user\", password: \"pass\"}\n ],\n kafka: [\n {id: \"kafka_cluster_1\", servers: [\"server1:9092\", \"server2:9092\"]}\n {\n id: \"kafka_cluster_2\",\n servers: [\"kafka-broker1:9092\", \"kafka-broker2:9092\", \"kafka-broker3:9092\"]\n parameters: [\n \"security.protocol=SASL_PLAINTEXT\",\n \"sasl.mechanism=GSSAPI\",\n \"sasl.kerberos.service.name=kafka-service\"\n ]\n }\n ],\n greenplum: [\n {\n id: \"greenplum_db1\", \n url: \"greenplum.db.com:5432/postgres\", \n username: \"user\", \n password: \"pass\",\n schema: \"public\"\n }\n ]\n }\n}\n
"},{"location":"03-job-configuration/02-Schemas/","title":"Schemas Configuration","text":"Schemas are used in Data Quality jobs for two purposes:
schemaMatch
load checks. See Schema Match Check.Schemas are set in schemas
section of job configuration and can be defined in different formats as described below. Format in which schema is defined is set in kind
field and defines what other fields are need to be provided.
Apart from kind
field, all types of schemas configuration contain following common parameters:
id
- Schema ID that uniquely identifies its configuration;description
- Optional schema description;metadata
- Optional list of arbitrary user-defined metadata parameters.This kind of schema definition is primarily used to provide schemas for delimited text files such as CSV or TSV. Nevertheless, these schemas can be used for schemaMatch
load checks as well. Using this type of configuration, only flat schemas can be defined (nested columns are not allowed).
Thus, delimited definition contains following parameters:
kind: \"delimited\"
- Required. Sets delimited schema definition format.id
- Required. Schema ID;description
- Optional. Schema description;schema
- Required. List of schema columns where each column is an object with following fields:name
- Required. Name of the column;type
- Required. Type of the column. See Supported Type Literals for allowed types.metadata
- Optional. List of user-defined metadata parameters specific to this schema where each parameter is a string in format:param.name=param.value
.Fixed-full kind of schema definition is used to provide schemas for read fixed-width text files. The key difference from other schema definitions is that columns widths are also provided which is crucial information for parsing fixed-width files. This kind of schemas may also be used for reading delimited files and for reference in schemaMatch
load checks. Using this type of configuration, only flat schemas can be defined (nested columns are not allowed).
Fixed-fill schema definition contains following parameters:
kind: \"fixedFull\"
- Required. Sets fixed-full schema definition format.id
- Required. Schema ID;description
- Optional. Schema description;schema
- Required. List of schema columns where each column is an object with following fields:name
- Required. Name of the column;type
- Required. Type of the column. See Supported Type Literals for allowed types.width
- Required. Integer width of column (number of symbols).metadata
- Optional. List of user-defined metadata parameters specific to this schema where each parameter is a string in format:param.name=param.value
.Fixed-short kind of schema definition provides a more compact syntax for defining schemas used for reading fixed-width files. The columns are defined by their name and width only. Subsequently, all columns will have StringType. This kind of schemas may also be used for reading delimited files and for reference in schemaMatch
load checks. Using this type of configuration, only flat schemas can be defined (nested columns are not allowed).
Fixed-short schema definition contains following parameters:
kind: \"fixedShort\"
- Required. Sets fixed-short schema definition format.id
- Required. Schema ID;description
- Optional. Schema description;schema
- Required. List of schema columns where each column is a string in format columnName:columnWidth
. Type of columns is always a StringType.metadata
- Optional. List of user-defined metadata parameters specific to this schema where each parameter is a string in format:param.name=param.value
.Avro kind of schema configuration is used to read schema from file with avro schema .avsc
. Thus, schema read from avro schema file can be used to read both, avro files and delimited text files as well as be used as reference in schemaMatch
load checks. In addition, avro schema format supports complex schemas with nested columns.
In order to read schema from avro file it is required to supply following parameters:
kind: \"avro\"
- Required. Sets avro schema definition format.id
- Required. Schema ID;description
- Optional. Schema description;schema
- Required. Path to avro schema file .avsc
to read schema from.Hive catalogue can be used as a source of schemas. Hive kind of schema definition is intended to retrieve schemas from hive tables. These schemas can be used to read both, avro files and delimited text files as well as be used as reference in schemaMatch
load checks.
To retrieve schema from hive table it is required to set up following parameters:
kind: \"hive\"
- Required. Sets hive schema definition format.id
- Required. Schema ID;description
- Optional. Schema description;schema
- Required. Hive schema to search for a table.table
- Required. Hive table to retrieve schema from.excludeColumns
- Optional. List of column names to exclude from schema. Sometimes it is required, e.g. to exclude partition columns from schema.metadata
- Optional. List of user-defined metadata parameters specific to this schema where each parameter is a string in format: param.name=param.value
.The following type literals are supported when defining schema columns in job configuration file:
string
boolean
date
timestamp
integer (32-bit integer)
long (64-bit integer)
short (16-bit integer)
byte (signed integer in a single byte)
double
float
decimal(precision, scale)
(precision <= 38; scale <= precision)As it is shown in the example below, schemas
section represent a list of schema definitions of various kinds.
jobConfig: {\n schemas: [\n {\n id: \"schema1\"\n kind: \"delimited\"\n description: \"Schema describing content of CSV file\"\n schema: [\n {name: \"colA\", type: \"string\"},\n {name: \"colB\", type: \"timestamp\"},\n {name: \"colC\", type: \"decimal(10, 3)\"}\n ]\n }\n {\n id: \"schema2\"\n kind: \"fixedFull\",\n schema: [\n {name: \"col1\", type: \"integer\", width: 5},\n {name: \"col2\", type: \"double\", width: 6},\n {name: \"col3\", type: \"boolean\", width: 4}\n ]\n }\n {id: \"schema3\", kind: \"fixedShort\", schema: [\"colOne:5\", \"colTwo:7\", \"colThree:9\"]}\n {id: \"hive_schema\", kind: \"hive\", schema: \"some_schema\", table: \"some_table\"}\n {\n id: \"avro_schema\", \n kind: \"avro\", \n schema: \"path/to/avro_schema.avsc\"\n metadata: [\n \"schema.origin=http://some-schema-registry-location\"\n ]\n }\n ]\n}\n
"},{"location":"03-job-configuration/03-Sources/","title":"Sources Configuration","text":"Reading sources is one of the major part of Data Quality job. During job execution, Checkita will read all sources into a Spark DataFrames, that will be later processed to calculate metrics and perform quality checks. In addition, dataframes' metadata is used to perform all types of load checks in order to ensure that source has the structure as expected.
Generally, sources can be read from file systems or object storage that Spark is connected to such as HDFS or S3. In additional, table-like source from Hive catalogue can be read. Apart from integrations natively supported by Spark, Checkita can read sources from external systems such as RDBMS or Kafka. For this purpose it is required to define connections to these systems in a first place. See Connections Configuration chapter for more details on connections configurations.
Thus, currently Checkita supports four general types of sources:
All sources must be defined in sources
section of job configuration. More details on how to configure sources of each of these types are shown below. Example of sources
section of job configuration is shown in Sources Configuration Example below.
Currently, there are five file types that Checkita can read as a source. These are:
When configuring file source, it is mandatory to indicate its type. Subsequently, configuration parameters may vary for files of different types.
Common parameters for sources of any file type are:
id
- Required. Source ID;description
- Optional. Source description;kind
- Required. File type. Can be one of the following: fixed
, delimited
, orc
, parquet
, avro
;path
- Required. File path. Can be a path to a directory or a S3-bucket. In this case all files from this directory/bucket will be read (assuming they all have the same schema). Note, that when reading from file system which is not spark default file system, it is required to add FS prefix to the path, e.g. file://
to read from local FS, or s3a://
to read from S3.keyFields
- Optional. List of columns that form a Primary Key or are used to identify row within a dataset. Key fields are primarily used in error collection reports. For more details on error collection, see Metric Error Collection chapter.metadata
- Optional. List of user-defined metadata parameters specific to this source where each parameter is a string in format: param.name=param.value
.In order to read fixed-width file it is additionally required to provide ID of the schema used to parse file content. Schema itself should be defined in schemas
section of job configuration as described in Schemas Configuration chapter.
schema
- Required. Schema ID used to parse fixed-width file. The schema definition type should be either fixedFull
or fixedShort
When reading delimited text file, its schema may be inferred from file header if it is presented in the file or may be explicitly defined in schemas
section of job configuration file as described in Schemas Configuration chapter.
Thus, additional parameters for configuring delimited file source are:
schema
- Optional. Schema ID used to parse delimited file text file. It is possible to use schema of any definition type as long as it has flat structure (nested columns are not supported for delimited text files).header
- Optional, default is false
. Boolean parameter indicating whether schema should be inferred from file header.delimiter
- Optional, default is ,
. Column delimiter.quote
- Optional, default is \"
. Column enclosing character.escape
- Optional, default is \\
. Escape character.IMPORTANT: If the header
parameter is absent or set tofalse
, then schema
parameter must be set. And vice versa, if header
parameter is set to true
, then schema
parameter must not be set. In other words, schema may be inferred from file header or be explicitly defined, but not both.
Avro files can contain schema in its header. Therefore, there are two options to read avro files: either infer schema from file or provide it explicitly. In the second case, schema must be defined in schemas
section of job configuration file as described in Schemas Configuration chapter. Therefore, there is only one additional parameter for avro file source configuration:
schema
- Optional. Schema ID used to read avro file. It is possible to use schema of any definition type.As ORC format contains schema within itself, then there are no additional parameters required to read ORC files.
"},{"location":"03-job-configuration/03-Sources/#parquet-file-sources","title":"Parquet File Sources","text":"As Parquet format contains schema within itself, then there are no additional parameters required to read Parquet files.
"},{"location":"03-job-configuration/03-Sources/#hive-sources-configuration","title":"Hive Sources Configuration","text":"In order to read data from Hive table it is required to provide following:
id
- Required. Source ID;description
- Optional. Source description;schema
- Required. Hive schema.table
- Required. Hive table.partitions
- Optional. List of partitions to read where each element is an object with following fields. If partitions are not set then entire table is read.name
- Required. Partition column nameexpr
- Optional. SQL expression used to filter partitions to read. This SQL expression must contain only reference to partition column that is being filtered (one that is defined in name
field). References to other columns are not allowed as well as any SQL sub-queries. It is allowed to use all types of SQL functions and literals. IMPORTANT: If parameterless function is used, it should be called with empty parentheses, e.g.: current_date()
values
- Optional. List of partition column name values to read. IMPORTANT: When defining partitions to read, it is required to specify either an SQL expression to filter partitions or an explicit list of partition values but not both.
keyFields
- Optional. List of columns that form a Primary Key or are used to identify row within a dataset. Key fields are primarily used in error collection reports. For more details on error collection, see Metric Error Collection chapter.metadata
- Optional. List of user-defined metadata parameters specific to this source where each parameter is a string in format: param.name=param.value
.Table source are read from supported RDBMS via JDBC connection. There are two options to read data from RDBMS:
In order to set up table source, it is required to supply following parameters:
id
- Required. Source ID;description
- Optional. Source description;connection
- Required. Connection ID to use for table source. Connection ID must refer to connection configuration for one of the supported RDBMS. See Connections Configuration chapter for more information.table
- Optional. Table to read.query
- Optional. Query to execute. Query result is read as table source.keyFields
- Optional. List of columns that form a Primary Key or are used to identify row within a dataset. Key fields are primarily used in error collection reports. For more details on error collection, see Metric Error Collection chapter.metadata
- Optional. List of user-defined metadata parameters specific to this source where each parameter is a string in format: param.name=param.value
.IMPORTANT: Either table
to read from must be specified or query
to execute, but not both. In addition, using queries is only allowed when allowSqlQueries
is set to true. Otherwise, any usage of arbitrary SQL queries will not be permitted. See Enablers chapter for more information.
TIP: HOCON format supports multiline string values. In order to define such a value, it is required to enclose string in triple quotes, e.g.:
multilineString: \"\"\"\n SELECT * from schema.table\n WHERE load_date = '2023-08-23';\n\"\"\"\n
"},{"location":"03-job-configuration/03-Sources/#kafka-sources-configuration","title":"Kafka Sources Configuration","text":"Despite, it is not common situation to read messages from Kafka topics in batch-mode, such feature is presented in Checkita framework. In order to set up source that reads from Kafka topic/s, it is required to provide following parameters:
id
- Required. Source ID;description
- Optional. Source description;connection
- Required. Connection ID to use for kafka source. Connection ID must refer to Kafka connection configuration. See Connections Configuration chapter for more information.topics
- Optional. List of topics to read. Topics can be specified in either of two formats:[\"topic1\", \"topic2\"]
;[\"topic1@[0, 1]\", \"topic2@[2, 4]\"]
topicPattern
- Optional. Topic pattern name: read all topics that match pattern.startingOffsets
- Optional, default is earliest
. Json string setting starting offsets to read from topic. By default, all topic is read.endingOffsets
- Optional, default is latest
. Json string setting ending offset until which to read from topic. By default, read topic till the end.keyFormat
- Optional, default is string
. Format used to decode message key.valueFormat
- Optional, default is string
. Format used to decode message value.keySchema
- Schema ID used to parse message key. If key format other than string
then schema must be provided.valueSchema
- Schema ID used to parse message value. If value format other than string
then schema must be provided.options
- Optional. Additional Spark parameters related to reading messages from Kafka topics such as: failOnDataLoss, kafkaConsumer.pollTimeoutMs, fetchOffset.numRetries, fetchOffset.retryIntervalMs, maxOffsetsPerTrigger
. Parameters are provided as a strings in format of parameterName=parameterValue
. For more information, see Spark Kafka Integration Guide.keyFields
- Optional. List of columns that form a Primary Key or are used to identify row within a dataset. Key fields are primarily used in error collection reports. For more details on error collection, see Metric Error Collection chapter.metadata
- Optional. List of user-defined metadata parameters specific to this source where each parameter is a string in format: param.name=param.value
.Currently, string
, xml
, json
and avro
formats are supported to decode message key and value.
TIP: In order to define JSON strings, they must be enclosed in triple quotes: \"\"\"{\"name1\": {\"name2\": \"value2\", \"name3\": \"value3\"\"}}\"\"\"
.
In order to read data from Greenplum table using pivotal connector it is required to provide following:
id
- Required. Source ID;description
- Optional. Source description;connection
- Required. Connection ID to use for table source. Connection ID must refer to Greenplum pivotal connection. See Connections Configuration chapter for more information.table
- Optional. Table to read.keyFields
- Optional. List of columns that form a Primary Key or are used to identify row within a dataset. Key fields are primarily used in error collection reports. For more details on error collection, see Metric Error Collection chapter.metadata
- Optional. List of user-defined metadata parameters specific to this source where each parameter is a string in format: param.name=param.value
.Custom sources can be used in cases when it is required to read data from the source type that is not explicitly supported (by one of the configuration described above). In order to configure a custom source, it is required to provide following parameters:
id
- Required. Source ID;description
- Optional. Source description;format
- Required. Spark DataFrame reader format that is used to read from the given source.path
- Optional. Path to read data from (if required).schema
- Optional. Explicit schema to be applied to data from the given source (if required).options
- Optional. Additional Spark parameters used to read data from the given source.keyFields
- Optional. List of columns that form a Primary Key or are used to identify row within a dataset. Key fields are primarily used in error collection reports. For more details on error collection, see Metric Error Collection chapter.metadata
- Optional. List of user-defined metadata parameters specific to this source where each parameter is a string in format: param.name=param.value
.After parameters above are defined then spark DataFrame reader is set up to read data from the source as follows:
val df = spark.read.format(format).schema(schema).options(options).load(path)\n
If any of the optional parameters is missing than corresponding Spark reader configuration is not set.
"},{"location":"03-job-configuration/03-Sources/#sources-configuration-example","title":"Sources Configuration Example","text":"As it is shown in the example below, sources of the same type are grouped within subsections named after the type of the source. These subsections should contain a list of source configurations of the corresponding type.
sources: {\n file: [\n {id: \"hdfs_fixed_file\", kind: \"fixed\", path: \"path/to/fixed/file.txt\", schema: \"schema2\"}\n {\n id: \"hdfs_delimited_source\",\n description: \"Reading static data from CSV file\"\n kind: \"delimited\",\n path: \"path/to/csv/file.csv\"\n schema: \"schema1\"\n medadata: [\n \"data.owner=some.person@some.domain\"\n \"file.version=1.1\"\n ]\n }\n {id: \"hdfs_avro_source\", kind: \"avro\", path: \"path/to/avro/file.avro\", schema: \"avro_schema\"}\n {id: \"hdfs_orc_source\", kind: \"orc\", path: \"path/to/orc/file.orc\"}\n ]\n hive: [\n {\n id: \"hive_source_1\", schema: \"some_schema\", table: \"some_table\",\n partitions: [{name: \"load_date\", values: [\"2023-06-30\", \"2023-07-01\"]}],\n keyFields: [\"id\", \"name\"]\n }\n ]\n table: [\n {id: \"table_source_1\", connection: \"oracle_db1\", table: \"some_table\", keyFields: [\"id\", \"name\"]}\n {id: \"table_source_2\", connection: \"sqlite_db\", table: \"other_table\"}\n ]\n kafka: [\n {\n id: \"kafka_source_1\",\n connection: \"kafka_broker\",\n topics: [\"topic1.pub\", \"topic2.pub\"]\n format: \"json\"\n }\n {\n id: \"kafka_source_2\",\n brokerId: \"kafka_broker\",\n topics: [\"topic3.pub@[1,3]\"]\n startingOffsets: \"\"\"{\"topic3.pub\":{\"1\":1234,\"3\":2314}}\"\"\"\n options: [\"kafkaConsumer.pollTimeoutMs=300000\"]\n format: \"json\"\n }\n ]\n greenplum: [\n {id: \"greenplum_source_1\", connection: \"greenplum_db\", table: \"some_table\"}\n ]\n }\n
"},{"location":"03-job-configuration/04-Streams/","title":"Streaming Sources Configurations","text":"When running Data Quality checks over the streaming data sources it is required to define them in streams
section of job configuration. Thus, sources defined in this section are read as streaming dataframes using Spark Structured Streaming API. More details on running data quality checks over streaming sources are given in Data Quality Checks over Streaming Sources chapter.
The configuration of streaming sources is the same as for the static ones. See chapter Sources Configuration for more details.
It is important to note that not all supported sources can be read in streaming mode. Currently, only sources below can be read as streams:
startingOffsets
. When defining streaming kafka source, the default value for this parameter is latest
. Also, for streaming kafka sources parameter endingOffsets
is ignored (all new records will be processed until application is stopped).The only additional parameter that is required to be defined for all streaming sources is following:
windowBy
- Optional, default is processingTime
. Source of timestamp used to assign records to a particular streaming windows and also to skip \"late\" records. Applicable only for streaming jobs! There are following options supported:processingTime
- Uses current timestamp at the moment when Spark processes record.eventTime
- Mostly applicable to kafka sources. Uses column with name timestamp
to retrieve time value from. This column must be of Timestamp type.custom(columnName)
- Uses arbitrary user-defined column to retrieve time value from. Specified column must be of Timestamp type. In addition, an SQL expression is are supported. An expression should also evaluate to value of Timestamp type. For example: custom(value.createdAt)
- the time value for a record will be retrieved from message value's field with name createdAt
.Checkita framework supports creation of virtual (temporary) sources base on regular once (defined in sources
section of job configuration, as described in Sources Configuration chapter). Virtual sources are created by applying transformations to existing sources using Spark SQL API. Subsequently, metrics and checks can also be applied to virtual sources.
It is also important to note, that virtual sources are created recursively, therefore, once virtual source is created it can be used to create another one in the same way as regular sources.
The following types of virtual sources are supported:
SQL
: enables creation of virtual source from existing once using arbitrary SQL query.Join
: creates virtual source by joining two (and only 2) existing sources.Filter
: creates virtual source from existing one by applying filter expression.Select
: creates virtual source from existing one by applying select expression.Aggregate
: creates virtual source by applying groupBy and aggregate operations to existing one.All types of virtual sources have common features:
Thus, virtual sources are defined in virtualSources
section of job configuration and have following common parameters:
id
- Required. Virtual source ID;description
- Optional. Virtual source description;parentSources
- Required. List of parent sources to use for creation of virtual sources. There could be a limitations imposed in number of parent sources, depending on virtual source type.persist
- Optional. One of the allowed Spark StorageLevels used to cache virtual sources. By default, virtual sources are not cached. Supported Spark StorageLevels are:NONE
, DISK_ONLY
, DISK_ONLY_2
, MEMORY_ONLY
, MEMORY_ONLY_2
, MEMORY_ONLY_SER
, MEMORY_ONLY_SER_2
, MEMORY_AND_DISK
, MEMORY_AND_DISK_2
, MEMORY_AND_DISK_SER
, MEMORY_AND_DISK_SER_2
, OFF_HEAP
.save
- Optional. File output configuration used to save virtual source. By default, virtual sources are not saved. For more information on configuring file outputs, see File Output Configuration chapter.keyFields
- Optional. List of columns that form a Primary Key or are used to identify row within a dataset. Key fields are primarily used in error collection reports. For more details on error collection, see Metric Error Collection chapter.metadata
- Optional. List of user-defined metadata parameters specific to this virtual source where each parameter is a string in format: param.name=param.value
.SQL
type of virtual sources is allowed only when allowSqlQueries
is set to true. Otherwise, any usage of arbitrary SQL queries will not be permitted. See Enablers chapter for more information. At the same time, there is no limitation on number of parent sources used to create SQL virtual source.
In order to define SQL virtual source, it is required to provide an SQL query:
kind: \"sql\"
- Required. Sets SQL
virtual source type.query
- Required. SQL query to build virtual source. Existing sources are referred in SQL query by their IDs.In order to define Join
type of virtual sources, it is required to provided two (and only two) parent sources that are being joined as well as type of the join to use and list of column to join by. Note, that in order to perform join, parent sources should have matching column names to join by. Join by condition is not currently supported:
kind: \"join\"
- Required. Sets Join
virtual source type.joinBy
- Required. List of columns to join by. Thus, parent sources must have the same columns names used for join.joinType
- Required. Type of Spark join to apply. Following join types are supported:inner
, outer
, cross
, full
, right
, left
, semi
, anti
, fullOuter
, rightOuter
, leftOuter
, leftSemi
, leftAnti
Filter
virtual source is defined by applying sequence of filter expressions to parent source. Thus, only one parent source must be supplied to this type of virtual source configuration:
kind: \"filter\"
- Required. Sets Filter
virtual source type.expr
- Required. Sequence of filter SQL expressions applied to parent source.Select
virtual source is defined by applying sequence of select expression to parent source. Each select expression should yield a new column. Thus, the number of columns in the virtual source correspond to number of provided select expressions. Subsequently, only one parent source must be supplied to this type of virtual source configuration:
kind: \"select\"
- Required. Sets Select
virtual source type.expr
- Required. Sequence of select SQL expressions applied to parent source.Aggregate
virtual source is defined by applying groupBy and aggregate operations to parent source. Thus, it is required to provide a list of columns used to group rows as well as list of aggregate operations in form of SQL expressions used to create columns with aggregated results. Thus, the number of columns in the virtual source correspond to number of provided aggregate expressions. Subsequently, only one parent source must be supplied to this type of virtual source configuration:
kind: \"aggregate\"
- Required. Sets Aggregate
virtual source type.groupBy
- Required. Sequence of columns used to group rows from parent source.expr
- Required. Sequence of SQL expressions used to get columns with aggregated results. As it is shown in the example below, virtualSources
section represent a list of virtual source definitions of various kinds.
jobConfig: {\n virtualSources: [\n {\n id: \"sqlVS\"\n kind: \"sql\"\n description: \"Filter data for specific date only\"\n parentSources: [\"hive_source_1\"]\n persist: \"disk_only\"\n save: {\n kind: \"orc\"\n path: \"some/path/to/vs/location\"\n }\n query: \"select id, name, entity, description from hive_source_1 where load_date == '2023-06-30'\"\n metadata: [\n \"source.owner=some.preson@some.domain\"\n \"critical.source=false\"\n ]\n }\n {\n id: \"joinVS\"\n kind: \"join\"\n parentSources: [\"hdfs_avro_source\", \"hdfs_orc_source\"]\n joinBy: [\"id\"]\n joinType: \"leftouter\"\n persist: \"memory_only\"\n keyFields: [\"id\", \"order_id\"]\n }\n {\n id: \"filterVS\"\n kind: \"filter\"\n parentSources: [\"kafka_source\"]\n expr: [\"key is not null\"]\n keyFields: [\"orderId\", \"dttm\"]\n }\n {\n id: \"selectVS\"\n kind: \"select\"\n parentSources: [\"table_source_1\"]\n expr: [\n \"count(id) as id_cnt\",\n \"sum(amount) as total_amount\"\n ]\n }\n {\n id: \"aggVS\"\n kind: \"aggregate\"\n parentSources: [\"hdfs_fixed_file\"]\n groupBy: [\"col1\"]\n expr: [\n \"avg(col2) as avg_col2\",\n \"sum(col3) as sum_col3\"\n ],\n keyFields: [\"col1\", \"avg_col2\", \"sum_col3\"]\n }\n ]\n}\n
"},{"location":"03-job-configuration/06-VirtualStreams/","title":"Virtual Streaming Sources Configuration","text":"When running Data Quality checks over the streaming data sources it is required to apply transformations to them thus creating virtual streaming sources. Such sources have to be defined in virutalStreams
section of the job configuration. Thus, transformations defined in this section are applied only to streaming sources using Spark Structured Streaming API. More details on running data quality checks over streaming sources are given in Data Quality Checks over Streaming Sources chapter.
The configuration of virtual streaming sources is the same as for the static ones. See chapter Virtual Sources Configuration for more details. In addition, column used as source of timestamp for windowing can be redefined and derived from the resultant virtual stream scheme. See Streaming Sources Configurations for more details on how to define column used as source of timestamp.
It is important to note that not all supported virtual sources types can be built from streaming sources. Currently, only filter and select types of virtual sources are supported in streaming applications.
"},{"location":"03-job-configuration/07-LoadChecks/","title":"Load Checks Configuration","text":"Load checks are the special type of checks that are distinguished from other checks as they are applied not to results of metrics computation but to sources metadata. Other key feature of load checks is that they are run prior actual data loading from the sources what is possible due Spark lazy evaluation mechanisms: sources are, essentially, Spark dataframes and load checks are used to verify their metadata.
Load checks are defined in loadChecks
section of job configuration and have following common parameters:
id
- Required. Load check ID;description
- Optional. Load check description;source
- Required. Reference to a source ID which metadata is being checked;metadata
- Optional. List of user-defined metadata parameters specific to this load check where each parameter is a string in format: param.name=param.value
.Currently, supported load checks are described below as well as configuration parameters specific to them.
"},{"location":"03-job-configuration/07-LoadChecks/#minimum-column-number-check","title":"Minimum Column Number Check","text":"This check is used to verify if number of columns in the source is equal to or greater than specified number. Load checks of this type are configured in the minColumnNum
subsection of the loadChecks
section. In addition to common parameters, following parameters should be specified:
option
- Required. Minimum number of columns that checked source must contain.This check is used to verify if number of columns in the source is exactly equal to specified number. Load checks of this type are configured in the exactColumnNum
subsection of the loadChecks
section. In addition to common parameters, following parameters should be specified:
option
- Required. Required number of columns that checked source must contain.This check is used to verify if source contains columns with required names. Load checks of this type are configured in the columnsExist
subsection of the loadChecks
section. In addition to common parameters, following parameters should be specified:
columns
- Required. List of column names that must exists in checked source.This check is used to verify if source schema matches predefined reference schema. Reference schema must be defined in schemas
section of configuration files as described in Schemas Configuration chapter. Load checks of this type are configured in the schemaMatch
subsection of the loadChecks
section. In addition to common parameters, following parameters should be specified:
schema
- Required. Reference Schema ID which should be used for comparison with source schema.ignoreOrder
- Optional, default is false
. Boolean parameter indicating whether columns order should be ignored during comparison of the schemas.As it is shown in the example below, load checks of the same type are grouped within subsections named after the type of the load check. These subsections should contain a list of load checks configurations of the corresponding type.
jobConfig: {\n loadChecks: {\n minColumnNum: [\n {id: \"load_check_1\", source: \"kafka_source\", option: 2}\n ]\n exactColumnNum: [\n {\n id: \"load_check_2\", \n description: \"Checking that source has exactly required number of columns\", \n source: \"hdfs_delimited_source\", option: 3\n metadata: [\n \"critical.loadcheck=true\"\n ]\n }\n ]\n columnsExist: [\n {id: \"loadCheck3\", source: \"sqlVS\", columns: [\"id\", \"name\", \"entity\", \"description\"]},\n {id: \"load_check_4\", source: \"hdfs_delimited_source\", columns: [\"id\", \"name\", \"value\"]}\n ]\n schemaMatch: [\n {id: \"load_check_5\", source: \"kafka_source\", schema: \"hive_schema\"}\n ]\n }\n}\n
"},{"location":"03-job-configuration/08-Metrics/","title":"Metrics Configuration","text":"Calculation of various metrics over the data is the main part of Data Quality job. Metrics allow evaluation of various indicators that describe data from both technical and business points of view. Indicators in their turn can signal about problems in the data.
All metrics are linked to a source over which they are calculated. Such metrics are called regular
. Apart from regular metrics there is a special kind of metrics that can be calculated based on other metrics results thus allowing metric compositions. These metrics are called composed
accordingly.
Metrics are defined in metrics
section of job configuration. Regular metrics are grouped by their type in regular
subsection while composed metrics are listed in composed
subsection.
All regular metrics are defined using following common parameters:
id
- Required. Metric ID;description
- Optional. Metric description.source
- Required. Reference to a source ID over which metric is caclulated;columns
- Required. List of columns over which metric is calculated. Regular metrics can be calculated for multiple columns. This means that the result of the metrics will be calculated for row values in these columns. There could be a limitation imposed on number of columns which metric can process. The only exception is Row Count Metric which does not need columns to be specified.params
- Some of the metrics may require additional parameters to be set. They should be specified within this object. The details on what parameters should be configured for metric are given below for each metric individually. Some metric definitions that require additional parameters are also have their default values set. In this case, params
object can be omitted to use default options for all parameters.metadata
- Optional. List of user-defined metadata parameters specific to this metric where each parameter is a string in format: param.name=param.value
.Additionally, some regular metrics have a logical condition that needs to be met when calculating metric increment per each individual row. If metric condition is not met, then Failure
status is returned for this particular row of data. Scenario when metric can yield Failure
status are explicitly described for each metric below. See Status Model used in Results chapter for more information on status model.
Calculates number of rows in the source. This is the only metric for which columns list should not be specified as it is not required to compute number of rows. Metric definition does not require additional parameters: params
should not be set.
All row count metrics are defined in rowCount
subsection.
Counts number of unique values in provided columns. When applied to multiple columns, total number of unique values in these columns is returned. Metric definition does not require additional parameters: params
should not be set.
All distinct values metrics are defined in distinctValues
subsection.
IMPORTANT. Calculation of exact number of unique values required O(N) memory. Therefore, to prevent OOM errors when working with extremely large dataset and with high-cardinality columns it is recommended to use Approximate Distinct Values Metric which uses HLL probabilistic algorithm to estimate number of unique values.
"},{"location":"03-job-configuration/08-Metrics/#approximate-distinct-values-metric","title":"Approximate Distinct Values Metric","text":"Calculates number of unique values approximately, using HyperLogLog algorithm.
This metric works with only one column.
All approximate distinct values metrics are defined in approximateDistinctValues
subsection. Additional parameters can be supplied:
accuracyError
- Optional, default is 0.01
. Accuracy error for estimating number of unique values.Counts number of null values in the specified columns. When applied to multiple columns, total number of null values in these columns is returned. Metric definition does not require additional parameters: params
should not be set.
All distinct values metrics are defined in nullValues
subsection.
Metric increment returns Failure
status for rows where some values in the specified columns are null.
Counts number of empty values in the specified columns (i.e. empty string values). When applied to multiple columns, total number of empty values in these columns is returned. Metric definition does not require additional parameters: params
should not be set.
All distinct values metrics are defined in emptyValues
subsection.
Metric increment returns Failure
status for rows where some values in the specified columns are empty.
Calculates the measure of completeness in the specified columns: (values_count - null_count) / values_count
. When applied to multiple columns, total number of values and total number of nulls are used in the equation above.
All completeness metrics are defined in completeness
subsection. Additional parameters can be supplied:
includeEmptyStrings
- Optional, default is false
. Boolean parameter indicating whether empty string values should be considered as nulls.Calculates measure of completeness of an incremental sequence of integers. In other words, it looks for the missing elements in the sequence and returns the relation: actual number of elements / required number of elements
.
This metric works with only one column.
The actual number of elements is just the number of unique values in the sequence. This metric defines it exactly, and therefore requires O(N)
memory to store these values. Therefore, to prevent OOM errors for extremely large sequences, it is recommended to use the Approximate Sequence Completeness Metric, which uses HLL probabilistic algorithm to estimate number of unique values.
The required number of elements is determined by the formula: (max_value - min_value) / increment + 1
, Where: * min_value
- the minimum value in the sequence; * max_value
- the maximum value in the sequence; * increment
- sequence step, default is 1.
All sequence completeness metrics are defined in sequenceCompleteness
subsection. Additional parameters can be supplied:
incremet
- Optional, default is 1
. Sequence increment step.Calculates the measure of completeness of an incremental sequence of integers approximately using the HyperLogLog algorithm. Works in the same way is Sequence Completeness Metric with only difference, that actual number of elements in the sequence is determined approximately using HLL algorithm.
This metric works with only one column.
All approximate sequence completeness metrics are defined in approximateSequenceCompleteness
subsection. Additional parameters can be supplied:
incremet
- Optional, default is 1
. Sequence increment step.accuracyError
- Optional, default is 0.01
. Accuracy error for estimating number of unique values.Calculates the minimum string length in the values of the specified columns. Metric definition does not require additional parameters: params
should not be set.
All minimum string metrics are defined in minString
subsection.
Metric increment returns Failure
status for rows where all values in the specified columns are not castable to string and, therefore, minimum string length cannot be computed.
Calculates the maximum string length in the values of the specified columns. Metric definition does not require additional parameters: params
should not be set.
All maximum string metrics are defined in maxString
subsection.
Metric increment returns Failure
status for rows where all values in the specified columns are not castable to string and, therefore, maximum string length cannot be computed.
Calculates the average string length in the values of the specified columns. Metric definition does not require additional parameters: params
should not be set.
All average string metrics are defined in avgString
subsection.
Metric increment returns Failure
status for rows where all values in the specified columns are not castable to string and, therefore, average string length cannot be computed.
Calculate number of values that meet the defined string length criteria.
All string length metrics are defined in stringLength
subsection. Additional parameters should be supplied:
length
- Required. Required string length threshold.compareRule
- Required. Comparison rule used to compare actual value string length with threshold one.eq
(==), lt
(<), lte
(<=), gt
(>), gte
(>=).Metric increment returns Failure
status for rows where some values in the specified columns do not meet defined string length criteria.
Counts number of values which fall into specified set of allowed values.
All string in domain metrics are defined in stringInDomain
subsection. Additional parameters should be supplied:
domain
- Required. List of allowed values.Metric increment returns Failure
status for rows where some values in the specified columns do not fall into set of allowed values.
Counts number of values which do not fall into specified set of avoided values.
All string out domain metrics are defined in stringOutDomain
subsection. Additional parameters should be supplied:
domain
- Required. List of avoided values.Metric increment returns Failure
status for rows where some values in the specified columns do fall into set of avoided values.
Counts number of values that are equal to the value given in metric definition.
All string values metrics are defined in stringValues
subsection. Additional parameters should be supplied:
compareValue
- Required. String value to compare with.Metric increment returns Failure
status for rows where some values in the specified columns do not match defined compare value.
Calculates number of values that match the defined regular expression.
All regex match metrics are defined in regexMatch
subsection. Additional parameters should be supplied:
regex
- Required. Regular expression to match.Metric increment returns Failure
status for rows where some values in the specified columns do not match defined regular expression.
Calculates number of values that do not match the defined regular expression.
All regex mismatch metrics are defined in regexMismatch
subsection. Additional parameters should be supplied:
regex
- Required. Regular expression that values should not match.Metric increment returns Failure
status for rows where some values in the specified columns do match defined regular expression.
Counts number of values which have the specified datetime format.
All formatted date metrics are defined in formattedDate
subsection. Additional parameters can be supplied:
dateFormat
- Optional, default is yyyy-MM-dd'T'HH:mm:ss.SSSZ
. Target datetime format. The datetime format must be specified as Java DateTimeFormatter pattern.NOTE If the specified columns are of type Timestamp
, it is assumed that they fit any datetime format and, therefore, metric will return the total number of non-empty cells. Accordingly, the datetime format does not need to be specified.
Metric increment returns Failure
status for rows where some values in the specified columns do not conform to defined datetime format.
Counts number of values which are numeric and number format satisfy defined number format criteria.
All formatted date metrics are defined in formattedNumber
subsection. Additional parameters should be supplied:
precision
- Required. The total number of digits in the value (excluding the decimal separator).scale
- Required. Number of decimal digits in the value.compareRule
- Optional, default is inbound
. Number format comparison rule:inbound
- the value must \"fit\" into the specified number format: actual precision and scale of the value are less than or equal to given ones.outbound
- the value must be outside the specified format: actual precision and scale of the value are strictly greater than given ones.Metric increment returns Failure
status for rows where some values in the specified columns do not satisfy defined number format criteria.
Finds minimum number from the values in the specified columns. Metric definition does not require additional parameters: params
should not be set.
All minimum number metrics are defined in minNumber
subsection.
Metric increment returns Failure
status for rows where all values in the specified columns are not castable to number and, therefore, minimum number cannot be computed.
Finds maximum number from the values in the specified columns. Metric definition does not require additional parameters: params
should not be set.
All maximum number metrics are defined in maxNumber
subsection.
Metric increment returns Failure
status for rows where all values in the specified columns are not castable to number and, therefore, maximum number cannot be computed.
Finds sum of the values in the specified columns. Metric definition does not require additional parameters: params
should not be set.
All sum number metrics are defined in sumNumber
subsection.
Metric increment returns Failure
status for rows where some values in the specified columns are not castable to number.
Finds average of the values in the specified column. Metric definition does not require additional parameters: params
should not be set.
This metric works with only one column.
All average number metrics are defined in avgNumber
subsection.
Metric increment returns Failure
status for rows where value in the specified column is not castable to number.
Finds standard deviation for the values in the specified column. Metric definition does not require additional parameters: params
should not be set.
This metric works with only one column.
All average number metrics are defined in stdNumber
subsection.
Metric increment returns Failure
status for rows where value in the specified column is not castable to number.
Counts number of values which string value can be converted to a number (double). Metric definition does not require additional parameters: params
should not be set.
All sum number metrics are defined in castedNumber
subsection.
Metric increment returns Failure
status for rows where some values in the specified columns are not castable to number.
Counts number of values which being cast to number (double) fall into specified set of allowed numbers.
All number in domain metrics are defined in numberInDomain
subsection. Additional parameters should be supplied:
domain
- Required. List of allowed numbers.Metric increment returns Failure
status for rows where some values in the specified columns do not fall into set of allowed numbers.
Counts number of values which being cast to number (double) do not fall into specified set of avoided numbers.
All number out domain metrics are defined in numberOutDomain
subsection. Additional parameters should be supplied:
domain
- Required. List of avoided numbers.Metric increment returns Failure
status for rows where some values in the specified columns do fall into set of avoided numbers.
Counts number of values which being cast to number (double) are less than (or equal to) the specified value.
All number less than metrics are defined in numberLessThan
subsection. Additional parameters should be supplied:
compareValue
- Required. Number to compare with.includeBound
- Optional, default is false
. Specifies whether to include compareValue
in the range for comparison.Metric increment returns Failure
status for rows where some values in the specified columns do not satisfy the comparison criteria.
Counts number of values which being cast to number (double) are greater than (or equal to) the specified value.
All number greater than metrics are defined in numberGreaterThan
subsection. Additional parameters should be supplied:
compareValue
- Required. Number to compare with.includeBound
- Optional, default is false
. Specifies whether to include compareValue
in the range for comparison.Metric increment returns Failure
status for rows where some values in the specified columns do not satisfy the comparison criteria.
Counts number of values which being cast to number (double) are within the given interval.
All number between metrics are defined in numberBetween
subsection. Additional parameters should be supplied:
lowerCompareValue
- Required. The lower bound of the interval.upperCompareValue
- Required. The upper bound of the interval.includeBound
- Optional, default is false
. Specifies whether to include interval bounds in the range for comparison.Metric increment returns Failure
status for rows where some values in the specified columns do not satisfy the comparison criteria.
Counts number of values which being cast to number (double) are outside the given interval.
All number between metrics are defined in numberNotBetween
subsection. Additional parameters should be supplied:
lowerCompareValue
- Required. The lower bound of the interval.upperCompareValue
- Required. The upper bound of the interval.includeBound
- Optional, default is false
. Specifies whether to include interval bounds in the range for comparison.Metric increment returns Failure
status for rows where some values in the specified columns do not satisfy the comparison criteria.
Counts number of values which being cast to number (double) are equal to the number given in metric definition.
All number values metrics are defined in numberValues
subsection. Additional parameters should be supplied:
compareValue
- Required. Number value to compare with.Metric increment returns Failure
status for rows where some values in the specified columns do not match defined compare value.
Calculates median value of the values in the specified column. Metric calculator uses TDigest library for computation of median value.
This metric works with only one column.
All median value metrics are defined in medianValue
subsection. Additional parameters can be supplied:
accuracyError
- Optional, default is 0.01
. Accuracy error for calculation of median value.Metric increment returns Failure
status for rows where value in the specified column is not castable to number.
Calculates first quantile for the values in the specified column. Metric calculator uses TDigest library for computation of first quantile.
This metric works with only one column.
All median value metrics are defined in firstQuantile
subsection. Additional parameters can be supplied:
accuracyError
- Optional, default is 0.01
. Accuracy error for calculation of first quantile value.Metric increment returns Failure
status for rows where value in the specified column is not castable to number.
Calculates third quantile for the values in the specified column. Metric calculator uses TDigest library for computation of third quantile.
This metric works with only one column.
All third value metrics are defined in thirdQuantile
subsection. Additional parameters can be supplied:
accuracyError
- Optional, default is 0.01
. Accuracy error for calculation of third value.Metric increment returns Failure
status for rows where value in the specified column is not castable to number.
Calculates an arbitrary quantile for the values in the specified column. Metric calculator uses TDigest library for computation of quantile.
This metric works with only one column.
All get quantile metrics are defined in getQuantile
subsection. Additional parameters should be supplied:
accuracyError
- Optional, default is 0.01
. Accuracy error for calculation of quantile value.target
- Required. A number in the interval [0, 1]
corresponding to the quantile that need to be caclulated.Metric increment returns Failure
status for rows where value in the specified column is not castable to number.
This metric is inverse of Get Quantile Metric. It calculates a percentile value (quantile in %) which corresponds to the specified number from the set of values in the column. Metric calculator uses TDigest library for computation of percentile value.
This metric works with only one column.
All get percentile metrics are defined in getPercentile
subsection. Additional parameters should be supplied:
accuracyError
- Optional, default is 0.01
. Accuracy error for calculation of percentile.target
- Required. The number from the set of values in the column, for which percentile is determined.Metric increment returns Failure
status for rows where value in the specified column is not castable to number.
Calculates the number of rows where values in the specified columns are equal to each other. Metric definition does not require additional parameters: params
should not be set.
This metric works with at least two columns.
All column equality metrics are defined in columnEq
subsection.
Metric increment returns Failure
status for rows where some values in the specified column are not castable to string or when they are not equal.
Calculates the number of rows where difference between date in two columns expressed in terms of days is less (strictly less) than the specified threshold value.
This metric works with exactly two columns.
All day distance metrics are defined in dayDistance
subsection. Additional parameters should be supplied:
threshold
- Required. Maximum allowed difference between two dates in days (not included in the range for comparison).dateFormat
- Optional, default is yyyy-MM-dd'T'HH:mm:ss.SSSZ
. Target datetime format. The datetime format must be specified as Java DateTimeFormatter pattern.NOTE If the specified columns are of type Timestamp
, it is assumed that they fit any datetime format and, therefore, metric will return the total number of non-empty cells. Accordingly, the datetime format does not need to be specified.
Metric increment returns Failure
status for rows where some values in the specified columns do not conform to the specified datetime format or when date difference in days is greater than or equal to specified threshold.
Calculates number of rows where Levenshtein distance between string values in the provided columns is less than (strictly less) specified threshold.
This metric works with exactly two columns.
All levenshtein distance metrics are defined in levenshteinDistance
subsection. Additional parameters should be supplied:
threshold
- Required. Maximum allowed Levenshtein distance.normalize
- Optional, default is false
. Boolean parameter indicating whether the Levenshtein distance should be normalized with respect to the maximum of the two string lengths.IMPORTANT. If Levenshtein distance is normalized then threshold value must be in range [0, 1]
.
Metric increment returns Failure
status for rows where some values in the specified columns are not castable to string or when Levenshtein distance is greater than or equal to specified threshold.
Calculates the covariance moment of the values in two columns (co-moment). Metric definition does not require additional parameters: params
should not be set.
This metric works with exactly two columns.
IMPORTANT. For the metric to be calculated, values in the specified columns must not be empty or null and also can be cast to number (double). If at least one corrupt value is found, then metric calculator returns NaN value.
Metric increment returns Failure
status for rows where some values in the specified columns cannot be cast to number.
Calculates the covariance of the values in two columns. Metric definition does not require additional parameters: params
should not be set.
This metric works with exactly two columns.
IMPORTANT. For the metric to be calculated, values in the specified columns must not be empty or null and also can be cast to number (double). If at least one corrupt value is found, then metric calculator returns NaN value.
Metric increment returns Failure
status for rows where some values in the specified columns cannot be cast to number.
Calculates the covariance of the values in two columns with the Bessel correction. Metric definition does not require additional parameters: params
should not be set.
This metric works with exactly two columns.
IMPORTANT. For the metric to be calculated, values in the specified columns must not be empty or null and also can be cast to number (double). If at least one corrupt value is found, then metric calculator returns NaN value.
Metric increment returns Failure
status for rows where some values in the specified columns cannot be cast to number.
This is a specific metric that calculates approximate N most frequently occurring values in a column. The metric calculator uses Twitter Algebird library, which implements abstract algebra methods for Scala.
This metric works with only one column.
All top N metrics are defined in topN
subsection. Additional parameters can be supplied:
targetNumber
- Optional, default is 10
. Number N of values to search.maxCapacity
- Optional, default is 100
. Maximum container size for storing top values.Composed metrics are defined using a formula (specified in the formula
field) for their calculation. As composed metric are intended for using other metric results to compute a derivative result then, these metrics can be referenced in the formula by their IDs.
Formula must be written using Mustache Template notation, e.g.: {{ metric_1 }} + {{ metic_2 }}
.
Basic (+-*/) and exponentiation (^) math operations are supported, as well as grouping using parentheses.
This, composed metrics are defined in the composed
subsection using following parameters:
id
- Required. Composed metric ID;description
- Optional. Composed metric description.formula
- Required. Formula to calculate composed metricAs it is shown in the example below, regular metrics of the same type are grouped within subsections named after the type of the metric. These subsections should contain a list of metrics configurations of the corresponding type. Composed metrics are listed in the separate subsection.
jobConfig: {\n metrics: {\n regular: {\n rowCount: [\n {id: \"hive_table_row_cnt\", description: \"Row count in hive_source_1\", source: \"hive_source_1\"},\n {id: \"csv_file_row_cnt\", description: \"Row count in hdfs_delimited_source\", source: \"hdfs_delimited_source\"}\n ]\n distinctValues: [\n {\n id: \"fixed_file_dist_name\", description: \"Distinct values in hdfs_fixed_file\",\n source: \"hdfs_fixed_file\", columns: [\"colA\"],\n metadata: [\n \"requestor=some.person@some.domain\"\n \"critical.metric=true\"\n ]\n }\n ]\n nullValues: [\n {id: \"hive_table_nulls\", description: \"Null values in columns id and name\", source: \"hive_source_1\", columns: [\"id\", \"name\"]}\n ]\n completeness: [\n {id: \"orc_data_compl\", description: \"Completness of column id\", source: \"hdfs_orc_source\", columns: [\"id\"]}\n {\n id: \"hive_table_nulls\", \n description: \"Completness of columns id and name\", \n source: \"hive_source_1\", \n columns: [\"id\", \"name\"]\n }\n ]\n avgNumber: [\n {id: \"avro_file1_avg_bal\", description: \"Avg number of column balance\", source: \"hdfs_avro_source\", columns: [\"balance\"]}\n ]\n regexMatch: [\n {\n id: \"table_source1_inn_regex\", description: \"Regex match for inn column\", source: \"table_source_1\",\n columns: [\"inn\"], params: {regex: \"\"\"^\\d{10}$\"\"\"}\n }\n ]\n stringInDomain: [\n {\n id: \"orc_data_segment_domain\", source: \"hdfs_orc_source\",\n columns: [\"segment\"], params: {domain: [\"FI\", \"MID\", \"SME\", \"INTL\", \"CIB\"]}\n }\n ]\n topN: [\n {\n id: \"filterVS_top3_currency\", description: \"Top 3 currency in filterVS\", source: \"filterVS\",\n columns: [\"id\"], params: {targetNumber: 3, maxCapacity: 10}\n }\n ],\n levenshteinDistance: [\n {\n id: \"lvnstDist\", source: \"table_source_2\", columns: [\"col1\", \"col2\"],\n params: {normalize: true, threshold: 0.3}\n }\n ]\n }\n composed: [\n {\n id: \"pct_of_null\", description: \"Percent of null values in hive_table1\",\n formula: \"100 * {{ hive_table_nulls }} ^ 2 / ( {{ hive_table_row_cnt }} + 1)\"\n }\n ]\n }\n}\n
"},{"location":"03-job-configuration/09-Checks/","title":"Checks Configurations","text":"Performing checks ove the metric results is an important step in Checkita framework. As metric results are calculated then checks can be configured to identify if there are any problems with quality of data.
In Checkita there are two main group of checks:
Spanshot
checks - allows comparison of metric results with static thresholds or with other metric results in the same Data Quality job.Trend
checks - allows evaluation of how metric result is changing over a certain period of time. Checks of this type are used to detect anomalies in data. In order trend check work it is required to set up Data Quality storage since check calculator need to fetch historical results for the metric of interest.After evaluation, check will have a status as described in Status Model used in Results chapter.
"},{"location":"03-job-configuration/09-Checks/#snapshot-checks","title":"Snapshot Checks","text":"Snapshot checks represent a simple comparison of metric results with a static threshold or with other metric result.
The following snapshot checks are supported:
equalTo
- checks if metric results is equal to a given threshold value or to other metric result.lessThan
- checks if metric result is less than a given threshold value or other metric result.greaterThan
- checks if metric result is greater than a given threshold value or other metric result.differByLT
- checks if relative difference between two metric results is less than a given threshold. This check succeeds when following expression is true: | metric - compareMetric | / compareMetric < threshold
.Snapshot checks are configured using common set of parameters, which are:
id
- Required. Check IDdescription
- Optional. Description of the check.metric
- Required. Metric ID which results is checked.compareMetric
- Optional. Metric ID which result is used as a threshold.threshold
- Optional. Explicit threshold value.metadata
- Optional. List of user-defined metadata parameters specific to this check where each parameter is a string in format: param.name=param.value
.IMPORTANT. When configuring check it should be specified either an explicit threshold value in threshold
field or other metric ID in compareMetric
field which result will be used as a threshold value. The only exception to this rule is differByLY
check for which it is required to specify both, threshold value and metric ID to compare with.
Trend checks are used to detect anomalies in data. This type of checks allows to verify that the value of the metric corresponds to its average value within a given deviation for a certain period of time. Maximum allowed deviation is configured by providing a threshold value.
Following trend checks are supported:
averageBoundFull
- sets the same upper and lower deviation from metric average result. Check succeeds when following expression is true: (1 - threshold) * avgResult <= currentResult <= (1 + threshold) * avgResult
.averageBoundUpper
- verifies only upper deviation from the metric average result. Check succeeds when following expression is true: currentResult <= (1 + threshold) * avgResult
.averageBoundLower
- verifies only lower deviation from the metric average result. Check succeeds when following expression is ture: (1 - threshold) * avgResult <= currentResult
.averageBoundRange
- sets different thresholds for upper and lower deviations from metric average results. Check succeeds when following expression is true: (1 - thresholdLower) * avgResult <= currentResult <= (1 + thresholdUpper) * avgResult
.Trend checks are configured using following set of parameters:
id
- Required. Check IDdescription
- Optional. Description of the check.metric
- Required. Metric ID which result is checked.rule
- Required. The rule for calculating historical average value of the metric. There are two rules supported:record
- calculates the average value of metric for the configured number of historical records.datetime
- calculates the average value of metric for the configured datetime window.windowSize
- Required. Size of the window for average metric value calculation:rule
is set to record
then window size is the number of records to retrieve.rule
is set to datetime
then window size is a duration string which should conform to Scala Duration.windowOffset
- Optional, default is 0
or 0s
. Set window offset back from current reference date (see Working with Date and Time chapter for more details on reference date). By default, offset is absent and window start from current reference date.rule
is set to record
then window offset is the number of records to skip from reference date.rule
is set to datetime
then window offset is a duration string which should conform to Scala Duration.threshold
- Required. Sets maximum allowed deviation from historical average metric result. Not used with averageBoundRange
check.thresholdLower
- Required. Sets maximum allowed lower deviation from historical average metric result. *Used only for averageBoundRange
check.thresholdUpper
- Required. Sets maximum allowed upper deviation from historical average metric result. *Used only for averageBoundRange
check.metadata
- Optional. List of user-defined metadata parameters specific to this metric where each parameter is a string in format: param.name=param.value
.NOTE. Scala Duration string has a format of <length><unit>
where following units are allowed: d
, day
, h
, hr
, hour
, m
, min
, minute
, s
, sec
, second
, ms
, milli
, millisecond
, \u00b5s
, micro
, microsecond
, ns
, nano
, nanosecond
.
This is a special check designed specifically for Top N Metric and working only with it. Top N rank check calculates the Jacquard distance between the current and previous sets of top N metric and checks if it does not exceed the threshold value.
IMPORTANT: Calculation of this check is currently supported only between the current and previous topN metric sets.
Top N rank check is configured using following parameters:
id
- Required. Check IDdescription
- Optional. Description of the check.metric
- Required. Metric ID which result is checked.targetNumber
- Required. Number of records from the set of top N metric results that is considered. This number should be less than or equal to number of collected top values in top N metric.threshold
- Required. Maximum allowed Jacquard distance between current and previous sets of records from top N metric result. Should be a number in interval [0, 1]
.metadata
- Optional. List of user-defined metadata parameters specific to this metric where each parameter is a string in format: param.name=param.value
.As it is shown in the example below, checks are grouped into two subsections: trend
and snapshot
. Then, checks of the same type are grouped within subsections named after the type of the checks. These subsections should contain a list of metrics configurations of the corresponding type.
jobConfig: {\n checks: {\n trend: {\n averageBoundFull: [\n {\n id: \"avg_bal_check\",\n description: \"Check that average balance stays within +/-25% of the week average\"\n metric: \"avro_file1_avg_bal\",\n rule: \"datetime\"\n windowSize: \"8d\"\n threshold: 0.25\n metadata: [\n \"requestor=some.person@some.domain\",\n \"critical.check=true\"\n ]\n }\n ]\n averageBoundUpper: [\n {id: \"avg_pct_null\", metric: \"pct_of_null\", rule: \"datetime\", windowSize: \"15d\", threshold: 0.5}\n ]\n averageBoundLower: [\n {id: \"avg_distinct\", metric: \"fixed_file_dist_name\", rule: \"record\", windowSize: 31, threshold: 0.3}\n ]\n averageBoundRange: [\n {\n id: \"avg_inn_match\",\n metric: \"table_source1_inn_regex\",\n rule: \"datetime\",\n windowSize: \"8d\",\n thresholdLower: 0.2\n thresholdUpper: 0.4\n }\n ]\n topNRank: [\n {id: \"top2_curr_match\", metric: \"filterVS_top3_currency\", targetNumber: 2, threshold: 0.1}\n ]\n }\n snapshot: {\n differByLT: [\n {\n id: \"row_cnt_diff\",\n description: \"Number of rows in two tables should not differ on more than 5%.\",\n metric: \"hive_table_row_cnt\"\n compareMetric: \"csv_file_row_cnt\"\n threshold: 0.05\n }\n ]\n equalTo: [\n {id: \"zero_nulls\", description: \"Hive Table1 mustn't contain nulls\", metric: \"hive_table_nulls\", threshold: 0}\n ]\n greaterThan: [\n {id: \"completeness_check\", metric: \"orc_data_compl\", threshold: 0.99}\n ]\n lessThan: [\n {id: \"null_threshold\", metric: \"pct_of_null\", threshold: 0.01}\n ]\n }\n }\n}\n
"},{"location":"03-job-configuration/10-Targets/","title":"Targets Configuration","text":"Targets are designed to provide alternative channels for sending results. First of all, targets can be used to send notifications to users about problems in their data or just send summary of Data Quality job. In addition, targets provide different ways for saving results, e.g. write them to file in HDFS or send to Kafka topic.
All targets are configured in targets
section of the job configuration. There are four general types of targets that can be configured depending on what information is being sent or saved:
Results targets are configured in the results
subsection and can be one of the following type depending on where they are sent or saved:
file
- Save results as file in local or remote (HDFS, S3, etc.) file system.hive
- Save results in HDFS as Hive table. Note that Hive table with required schema must be created prior results saving.kafka
- Send results to Kafka topic in JSON format.For result target of any type it is required to configure list of result to be saved or sent:
resultTypes
- Required. List of result types to save or sent. May include following: regularMetrics
, composedMetrics
, loadChecks
, checks
, jobState
. Note that all results types are reduced to Unified Targets Schema and saved together.In order to save results to file, it is required to configure result target of file
type. In addition to list of saved results, it is required to configure file output.
save
- Required. File output configuration used to save results. For more information on configuring file outputs, see File Output Configuration chapter.File with results will have Unified Targets Schema.
"},{"location":"03-job-configuration/10-Targets/#save-results-to-hive","title":"Save Results to Hive","text":"In order to save results to Hive table, it is required to configure result target of hive
type. Hive table to which results will be saved must be created in advance with Unified Targets Schema.
Thus, in addition to list of saved results, it is required to indicate Hive schema and table:
schema
- Required. Hive schema.table
- Required. Hive table.Note that results will be appended to Hive table.
"},{"location":"03-job-configuration/10-Targets/#send-results-to-kafka","title":"Send Results to Kafka","text":"In order to send results to Kafka topic, it is required to configure result target of kafka
type. Connection to Kafka cluster must be configured in connections
section of job configuration as described in Kafka Connection Configuration.
Thus, in addition to list of saved results, it is required provide following parameters:
connection
- Required. Kafka connection ID.topic
- Required. Kafka topic to send results to.options
- Optional. Additional list of Kafka parameters for sending messages to topic. Parameters are provided as a strings in format of parameterName=parameterValue
.Results will be saved as JSON messages. In addition, aggregatedKafkaOutput
parameter configured in application settings controls how results will be sent (see Enablers chapter):
Error collection targets are configured in errorCollection
subsection and can be one of the following type depending on where metric errors are sent or saved:
file
- Save metric errors as file in local or remote (HDFS, S3, etc.) file system.hive
- Save metric errors in HDFS as Hive table. Note that Hive table with required schema must be created prior metric errors saving.kafka
- Send metric errors to Kafka topic in JSON format.Note that metric errors are transformed to Unified Targets Schema when send or saved.
For error collection target of any type the following parameters can be supplied:
metrics
- Optional. List of metric for which errors will be saved. If omitted, then errors are saved for all metrics defined in Data Quality job.dumpSize
- Optional, default is 100
. Allows additionally limit number of errors saved per metric in order to make reports more compact. Could not be larger, than application-level limitation as described in Enablers chapter.In order to save metric errors to file, it is required to configure error collection target of file
type. In addition to common error collection target parameters, it is required to configure file output:
save
- Required. File output configuration used to save results. For more information on configuring file outputs, see File Output Configuration chapter.File with metric errors will have Unified Targets Schema.
"},{"location":"03-job-configuration/10-Targets/#save-metric-errors-to-hive","title":"Save Metric Errors to Hive","text":"In order to save metric errors to Hive table, it is required to configure result error collection target of hive
type. Hive table to which metric errors will be saved must be created in advance with Unified Targets Schema.
Thus, in addition to common error collection target parameters, it is required to indicate Hive schema and table:
schema
- Required. Hive schema.table
- Required. Hive table.Note that metric errors will be appended to Hive table.
"},{"location":"03-job-configuration/10-Targets/#send-metric-errors-to-kafka","title":"Send Metric Errors to Kafka","text":"In order to send metric errors to Kafka topic, it is required to configure error collection target of kafka
type. Connection to Kafka cluster must be configured in connections
section of job configuration as described in Kafka Connection Configuration.
Thus, in addition to common error collection target parameters, it is required provide following ones:
connection
- Required. Kafka connection ID.topic
- Required. Kafka topic to send results to.options
- Optional. Additional list of Kafka parameters for sending messages to topic. Parameters are provided as a strings in format of parameterName=parameterValue
.Metric errors will be saved as JSON messages. In addition, aggregatedKafkaOutput
parameter configured in application settings controls how metric errors will be sent (see Enablers chapter):
IMPORTANT. Be careful, when using this option for saving metric errors as there could be a significant number of them. In order to fit into Kafka message size limits it is recommended to limit number of errors sent per each metric by setting dumpSize
parameter to a reasonably low number.
Checkita framework collects summary upon completion of each Data Quality job. Summary targets are designed accordingly, to enable sending summary reports to users. Thus, summary targets are configured in summary
subsection and can be one of the following type depending on where summary reports are sent or saved:
email
- Send summary report to user(s) via email.mattermost
- Send summary report to mattermost either to channel or to user's direct messages.kafka
- Send summary report to Kafka topic in JSON format. When sending summary report to Kafka, it is transformed to Unified Targets Schema.For summary target of email
or mattermost
type the following parameters can be supplied:
attachMetricErrors
- Optional, default is false
. Boolean parameter indicating whether report with collected metric errors should be attached to email or message with summary report.attachFailedChecks
- Optional, default is false
. Boolean parameter indicating whether report with failed checks should be attached to email or message with summary report.metrics
- Optional. If attachMetricErrors
is set to true
, then this parameter can be used to specify list of metric for which errors will be saved. If omitted, then errors are saved for all metrics defined in Data Quality job.dumpSize
- Optional, default is 100
. If attachMetricErrors
is set to true
, then this parameter allows additionally limit number of errors saved per metric in order to make report more compact. Could not be larger, than application-level limitation as described in Enablers chapter.In order to send summary report via email, it is required to configure summary target of email
type. In addition to common summary target parameters, it is required to configure following ones:
recipients
- Required. List of recipients' emails to which summary report will be sent.template
- Optional. HTML template to build email body.templateFile
- Optional. Location of the file with HTML template to build email body.HTML template is optional. If HTML template is not provided then the default summary report body is compiled. Moreover, it should be noted, that template
parameter has higher priority than templateFile
one. Therefore, if both of them are set then explicitly defined then HTML template from template
parameter is used.
In addition, HTML templates support parameter substitution using Mustache Template notation, e.g.: This {{ parameterName }} has a value of {{ parameterValue }}
. List of available parameters that can be used for substitution in HTML templates is given in Job Summary Parameters Available for Templates chapter below.
In order to send summary report to mattermost, it is required to configure summary target of mattermost
type. In addition to common summary target parameters, it is required to configure following ones:
recipients
- Required. List of recipients' to which summary report will be sent. Message can be sent either to a channel or to a user's direct messages:#
sign: #someChannel
.@
prefix: @someUser
.template
- Optional. Markdown template to build message body.templateFile
- Optional. Location of the file with Markdown template to build message body.Markdown template is optional. If Markdown template is not provided then the default summary report body is compiled. Moreover, it should be noted, that template
parameter has higher priority than templateFile
one. Therefore, if both of them are set then explicitly defined then Markdown template from template
parameter is used.
In addition, Markdown templates support parameter substitution using Mustache Template notation, e.g.: This {{ parameterName }} has a value of {{ parameterValue }}
. List of available parameters that can be used for substitution in Markdown templates is given in Job Summary Parameters Available for Templates chapter below.
In order to send summary report to Kafka topic, it is required to configure summary target of kafka
type. Connection to Kafka cluster must be configured in connections
section of job configuration as described in Kafka Connection Configuration.
Kafka messages do not support any from of attachments, therefore, only summary report itself can be sent to Kafka topic. Summary report is sent in form of JSON string that will contain all the parameters defined in Job Summary Parameters Available for Templates chapter below. JSON string format will conform to Unified Targets Schema.
Thus, in order to configure kafka
summary target it is required to specify following parameters:
connection
- Required. Kafka connection ID.topic
- Required. Kafka topic to send results to.options
- Optional. Additional list of Kafka parameters for sending messages to topic. Parameters are provided as a strings in format of parameterName=parameterValue
.Check alert targets are developed specifically to enable notification sending in case if some of watched checks have failed. These targets are configured in checkAlert
subsection and can be one of the following type depending on where alerts are sent:
email
- Send check alert to user(s) via email.mattermost
- Send check alert to mattermost either to channel or to user's direct messages.For check alert target of any type the following parameters can be supplied:
id
- Required. ID of check alert. There could be different check alert configurations for different sets of checks. Therefore, check alerts should have an ID, in order to distinguish them.checks
- Optional. List of watched checks. If any of watched checks fails then alert notification is sent. If omitted, then all checks defined in the Data Quality job are being watched.In order to send check alert via email, it is required to configure check alert target of email
type. In addition to common check alert target parameters, it is required to configure following ones:
recipients
- Required. List of recipients' emails to which check alert will be sent.template
- Optional. HTML template to build email body.templateFile
- Optional. Location of the file with HTML template to build email body.HTML template is optional. If HTML template is not provided then the default check alert body is compiled. Moreover, it should be noted, that template
parameter has higher priority than templateFile
one. Therefore, if both of them are set then explicitly defined then HTML template from template
parameter is used.
In addition, HTML templates support parameter substitution using Mustache Template notation, e.g.: This {{ parameterName }} has a value of {{ parameterValue }}
. List of available parameters that can be used for substitution in HTML templates is given in Job Summary Parameters Available for Templates chapter below.
In order to check alert to mattermost, it is required to configure check alert target of mattermost
type. In addition to common check alert target parameters, it is required to configure following ones:
recipients
- Required. List of recipients' to which check alert will be sent. Message can be sent either to a channel or to a user's direct messages:#
sign: #someChannel
.@
prefix: @someUser
.template
- Optional. Markdown template to build message body.templateFile
- Optional. Location of the file with Markdown template to build message body.Markdown template is optional. If Markdown template is not provided then the default check alert body is compiled. Moreover, it should be noted, that template
parameter has higher priority than templateFile
one. Therefore, if both of them are set then explicitly defined then Markdown template from template
parameter is used.
In addition, Markdown templates support parameter substitution using Mustache Template notation, e.g.: This {{ parameterName }} has a value of {{ parameterValue }}
. List of available parameters that can be used for substitution in Markdown templates is given in Job Summary Parameters Available for Templates chapter below.
All targets that are saved to five or sent to Kafka are reduced to unified schema. Such approach have some advantages:
Thus, unified schema is following:
Column Name Column Type Comment jobId STRING ID of Data Quality Job referenceDate STRING Reference datetime for which job is run executionDate STRING Datetime of actual job start entityType STRING Type of result data STRING JSON string. Content varies depending in entityTypeFrom the schema above it is seen that all data that is specific to a results of each type is stored as JSON string. When sending results to Kafka, the schema would be the same but data
will become a nested JSON object.
It is already noted that HTML or Markdown templates used to build body of notifications support parameter substitution using Mustache Template notation. List of available parameters that can be used for substitution is shown below.
For example, Markdown template with check alert notification could look like:
# Checkita Data Quality Notification - Failed Check Alert\n\nYou requested notifications on failed checks in Data Quality Job: `{{ jobId}}`.\n\nInform you that some watched checks have failed for job started for:\n\n* Reference date: `{{ referenceDate }}`\n* Execution date: `{{ executionDate }}`\n\nAttached files contain information about failed checks. Please, review them.\n
jobId
- ID of the current Data Quality job.jobStatus
- Job status: Success
if all checks are passed, Failure
otherwise.referenceDate
- Reference datetime for which job is run.executionDate
- Datetime of actual job start.numSources
- Total number of sources in the job.numMetrics
- Total number of metric in the job.numChecks
- Total number of checks in the job.numLoadChecks
- Total number of load checks in the job.numMetricsWithErrors
- Number of metrics that yielded errors during their computation.numFailedChecks
- Number of failed checks.numFailedLoadChecks
- Number of failed load checks.listMetricsWithErrors
- List of all metrics that yielded errors during their computation.listFailedChecks
- List of failed checks.listFailedLoadChecks
- List of failed load checks.As it is shown in the example below, targets are grouped into subsections named after their type. These subsections may contain various target configuration depending on the channel where targets are saved or sent. Due to multiple check alert configurations are allowed then they are grouped as list of check alerts sent to a specific channel (email or mattermost).
jobConfig: {\n targets: {\n results: {\n file: {\n resultTypes: [\"checks\", \"loadChecks\"]\n save: {\n kind: \"delimited\"\n path: \"/tmp/dataquality/results\"\n header: true\n }\n }\n hive: {\n resultTypes: [\"regularMetrics\", \"composedMetrics\", \"loadChecks\", \"checks\", \"jobState\"],\n schema: \"WORKSPACE_CIBAA\",\n table: \"DQ_TARGETS\"\n }\n kafka: {\n resultTypes: [\"regularMetrics\", \"composedMetrics\", \"loadChecks\", \"checks\"],\n connection: \"kafka_broker\"\n topic: \"some.topic\"\n }\n }\n errorCollection: {\n file: {\n metrics: [\"pct_of_null\", \"hive_table_row_cnt\", \"hive_table_nulls\"]\n dumpSize: 50\n save: {\n kind: \"orc\"\n path: \"tmp/DQ/ERRORS\"\n }\n }\n kafka: {\n metrics: [\"hive_table_nulls\", \"fixed_file_dist_name\", \"table_source1_inn_regex\"]\n dumpSize: 25\n connection: \"kafka_broker\"\n topic: \"some.topic\"\n options: [\"addParam=true\"]\n }\n }\n summary: {\n email: {\n attachMetricErrors: true\n metrics: [\"hive_table_nulls\", \"fixed_file_dist_name\", \"table_source1_inn_regex\"]\n dumpSize: 10\n recipients: [\"some.person@some.domain\"]\n }\n mattermost: {\n attachMetricErrors: true\n metrics: [\"hive_table_nulls\", \"fixed_file_dist_name\", \"table_source1_inn_regex\"]\n dumpSize: 10\n recipients: [\"@someUser\", \"#someChannel\"]\n }\n kafka: {\n connection: \"kafka_broker\"\n topic: \"dev.dq_results.topic\"\n }\n }\n checkAlerts: {\n email: [\n {\n id: \"alert1\"\n checks: [\"avg_bal_check\", \"zero_nulls\"]\n recipients: [\"some.peron@some.domain\"]\n }\n {\n id: \"alert2\"\n checks: [\"top2_curr_match\", \"completeness_check\"]\n recipients: [\"another.peron@some.domain\"]\n }\n ]\n mattermost: [\n {\n id: \"alert3\"\n checks: [\"avg_bal_check\", \"zero_nulls\"]\n recipients: [\"@someUser\"]\n }\n {\n id: \"alert4\"\n checks: [\"top2_curr_match\", \"completeness_check\"]\n recipients: [\"#someChannel\"]\n }\n ]\n }\n }\n}\n
"},{"location":"03-job-configuration/11-FileOutputs/","title":"File Output Configuration","text":"Checkita framework has mechanism designed to save some it results to a file either in a local or remote (HDFS, S3, etc.) file system. Thus, it is possible to save virtual sources that are build during Data Quality job execution. Saved virtual sources can later be used for various purposes such as for investigating data quality problems. Apart from that, Checkita supports saving various Data Quality job results as files. In order to do that, it is required to configure targets of the desired type. See Targets Configuration for more information.
Thus, Checkita framework support saving file outputs of the following formats:
This, in order to configure file output it is required to supply following parameters:
kind
- Required. File format. Should be one of the following: delimited
, orc
, parquet
, avro
.path
- Required. File path to save. Spark DataFrame writer is used under hood to save outputs. Therefore, path, that is provided should point to a directory. If directory non-empty then content is overwritten.Additional parameters can be defined for delimited text file output. These are:
delimiter
- Optional, default is ,
. Column delimiter.quote
- Optional, default is \"
. Column enclosing character.escape
- Optional, default is \\
. Escape character.header
- Optional, default is false
. Boolean parameter indicating whether file should be written with columns header or without it.parquet file
{\n kind: \"parquet\"\n path: \"/tmp/parquet_file_ooutput\"\n }\n
delimited file
{\n kind: \"delimited\"\n path: \"/tmp/dataquality/results\"\n header: true\n}\n
Below example represents abstract but fully filled Data Quality job configuration with most of the features of Checkita framework configured.
jobConfig: {\n jobId: \"job_id_for_this_configuration\"\n\n connections: {\n oracle: [\n {id: \"oracle_db1\", url: \"oracle.db.com:1521/public\", username: \"db-user\", password: \"dq-password\"}\n ]\n sqlite: [\n {id: \"sqlite_db\", url: \"some/path/to/db.sqlite\"}\n ],\n kafka: [\n {id: \"kafka_broker\", servers: [\"server1:9092\", \"server2:9092\"]}\n ]\n }\n\n schemas: [\n {\n id: \"schema1\"\n kind: \"delimited\"\n schema: [\n {name: \"colA\", type: \"string\"},\n {name: \"colB\", type: \"timestamp\"},\n {name: \"colC\", type: \"decimal(10, 3)\"}\n ]\n },\n {\n id: \"schema2\"\n kind: \"fixedFull\",\n schema: [\n {name: \"col1\", type: \"integer\", width: 5},\n {name: \"col2\", type: \"double\", width: 6},\n {name: \"col3\", type: \"boolean\", width: 4}\n ]\n },\n {id: \"schema3\", kind: \"fixedShort\", schema: [\"colOne:5\", \"colTwo:7\", \"colThree:9\"]}\n {id: \"hive_schema\", kind: \"hive\", schema: \"some_schema\", table: \"some_table\"}\n {id: \"avro_schema\", kind: \"avro\", schema: \"some/path/to/avro_schema.avsc\"}\n\n ]\n\n sources: {\n table: [\n {id: \"table_source_1\", connection: \"oracle_db1\", table: \"some_table\", keyFields: [\"id\", \"name\"]}\n {id: \"table_source_2\", connection: \"sqlite_db\", table: \"other_table\"}\n ]\n hive: [\n {\n id: \"hive_source_1\", schema: \"some_schema\", table: \"some_table\",\n partitions: [{name: \"dlk_cob_date\", values: [\"2023-06-30\", \"2023-07-01\"]}],\n keyFields: [\"id\", \"name\"]\n }\n ]\n file: [\n {id: \"hdfs_avro_source\", kind: \"avro\", path: \"path/to/avro/file.avro\", schema: \"avro_schema\"},\n {id: \"hdfs_orc_source\", kind: \"orc\", path: \"path/to/orc/file.orc\"},\n {\n id: \"hdfs_delimited_source\",\n kind: \"delimited\",\n path: \"path/to/csv/file.csv\"\n schema: \"schema1\"\n },\n {id: \"hdfs_fixed_file\", kind: \"fixed\", path: \"path/to/fixed/file.txt\", schema: \"schema2\"},\n ],\n kafka: [\n {\n id: \"kafka_source\",\n connection: \"kafka_broker\",\n topics: [\"topic1.pub\", \"topic2.pub\"]\n format: \"json\"\n }\n ]\n }\n\n virtualSources: [\n {\n id: \"sqlVS\"\n kind: \"sql\"\n parentSources: [\"hive_source_1\"]\n persist: \"disk_only\"\n save: {\n kind: \"orc\"\n path: ${basePath}\"/sqlVs\"\n }\n query: \"select id, name, entity, description from hive_source_1 where dlk_cob_date == '2023-06-30'\"\n }\n {\n id: \"joinVS\"\n kind: \"join\"\n parentSources: [\"hdfs_avro_source\", \"hdfs_orc_source\"]\n joinBy: [\"id\"]\n joinType: \"leftouter\"\n persist: \"memory_only\"\n keyFields: [\"id\", \"order_id\"]\n }\n {\n id: \"filterVS\"\n kind: \"filter\"\n parentSources: [\"kafka_source\"]\n expr: [\"key is not null\"]\n keyFields: [\"batchId\", \"dttm\"]\n }\n {\n id: \"selectVS\"\n kind: \"select\"\n parentSources: [\"table_source_1\"]\n expr: [\n \"count(id) as id_cnt\",\n \"count(name) as name_cnt\"\n ]\n }\n {\n id: \"aggVS\"\n kind: \"aggregate\"\n parentSources: [\"hdfs_fixed_file\"]\n groupBy: [\"col1\"]\n expr: [\n \"avg(col2) as avg_col2\",\n \"sum(col3) as sum_col3\"\n ],\n keyFields: [\"col1\", \"avg_col2\", \"sum_col3\"]\n }\n ]\n\n loadChecks: {\n exactColumnNum: [\n {id: \"loadCheck1\", source: \"hdfs_delimited_source\", option: 3}\n ]\n minColumnNum: [\n {id: \"loadCheck2\", source: \"kafka_source\", option: 2}\n ]\n columnsExist: [\n {id: \"loadCheck3\", source: \"sqlVS\", columns: [\"id\", \"name\", \"entity\", \"description\"]},\n {id: \"load_check_4\", source: \"hdfs_delimited_source\", columns: [\"id\", \"name\", \"value\"]}\n ]\n schemaMatch: [\n {id: \"load_check_5\", source: \"kafka_source\", schema: \"hive_schema\"}\n ]\n }\n\n metrics: {\n regular: {\n rowCount: [\n {id: \"hive_table_row_cnt\", description: \"Row count in hive_source_1\", source: \"hive_source_1\"},\n {id: \"csv_file_row_cnt\", description: \"Row count in hdfs_delimited_source\", source: \"hdfs_delimited_source\"}\n ]\n distinctValues: [\n {\n id: \"fixed_file_dist_name\", description: \"Distinct values in hdfs_fixed_file\",\n source: \"hdfs_fixed_file\", columns: [\"colA\"]\n }\n ]\n nullValues: [\n {id: \"hive_table_nulls\", description: \"Null values in columns id and name\", source: \"hive_source_1\", columns: [\"id\", \"name\"]}\n ]\n completeness: [\n {id: \"orc_data_compl\", description: \"Completness of column id\", source: \"hdfs_orc_source\", columns: [\"id\"]}\n ]\n avgNumber: [\n {id: \"avro_file1_avg_bal\", description: \"Avg number of column balance\", source: \"hdfs_avro_source\", columns: [\"balance\"]}\n ]\n regexMatch: [\n {\n id: \"table_source1_inn_regex\", description: \"Regex match for inn column\", source: \"table_source_1\",\n columns: [\"inn\"], params: {regex: \"\"\"^\\d{10}$\"\"\"}\n }\n ]\n stringInDomain: [\n {\n id: \"orc_data_segment_domain\", source: \"hdfs_orc_source\",\n columns: [\"segment\"], params: {domain: [\"FI\", \"MID\", \"SME\", \"INTL\", \"CIB\"]}\n }\n ]\n topN: [\n {\n id: \"filterVS_top3_currency\", description: \"Top 3 currency in filterVS\", source: \"filterVS\",\n columns: [\"id\"], params: {targetNumber: 3, maxCapacity: 10}\n }\n ],\n levenshteinDistance: [\n {\n id: \"lvnstDist\", source: \"table_source_2\", columns: [\"col1\", \"col2\"],\n params: {normalize: true, threshold: 0.3}\n }\n ]\n }\n composed: [\n {\n id: \"pct_of_null\", description: \"Percent of null values in hive_table1\",\n formula: \"100 * {{ hive_table_nulls }} ^ 2 / ( {{ hive_table_row_cnt }} + 1)\"\n }\n ]\n }\n\n checks: {\n trend: {\n averageBoundFull: [\n {\n id: \"avg_bal_check\",\n description: \"Check that average balance stays within +/-25% of the week average\"\n metric: \"avro_file1_avg_bal\",\n rule: \"datetime\"\n windowSize: \"8d\"\n threshold: 0.25\n }\n ]\n averageBoundUpper: [\n {id: \"avg_pct_null\", metric: \"pct_of_null\", rule: \"datetime\", windowSize: \"15d\", threshold: 0.5}\n ]\n averageBoundLower: [\n {id: \"avg_distinct\", metric: \"fixed_file_dist_name\", rule: \"record\", windowSize: 31, threshold: 0.3}\n ]\n averageBoundRange: [\n {\n id: \"avg_inn_match\",\n metric: \"table_source1_inn_regex\",\n rule: \"datetime\",\n windowSize: \"8d\",\n thresholdLower: 0.2\n thresholdUpper: 0.4\n }\n ]\n topNRank: [\n {id: \"top2_curr_match\", metric: \"filterVS_top3_currency\", targetNumber: 2, threshold: 0.1}\n ]\n }\n snapshot: {\n differByLT: [\n {\n id: \"row_cnt_diff\",\n description: \"Number of rows in two tables should not differ on more than 5%.\",\n metric: \"hive_table_row_cnt\"\n compareMetric: \"csv_file_row_cnt\"\n threshold: 0.05\n }\n ]\n equalTo: [\n {id: \"zero_nulls\", description: \"Hive Table1 mustn't contain nulls\", metric: \"hive_table_nulls\", threshold: 0}\n ]\n greaterThan: [\n {id: \"completeness_check\", metric: \"orc_data_compl\", threshold: 0.99}\n ]\n lessThan: [\n {id: \"null_threshold\", metric: \"pct_of_null\", threshold: 0.01}\n ]\n }\n }\n\n targets: {\n results: {\n file: {\n resultTypes: [\"checks\", \"loadChecks\"]\n save: {\n kind: \"delimited\"\n path: ${basePath}\"/results/\"${referenceDate}\n header: true\n }\n }\n hive: {\n resultTypes: [\"regularMetrics\", \"composedMetrics\", \"loadChecks\", \"checks\"],\n schema: \"DQ_SCHEMA\",\n table: \"DQ_TARGETS\"\n }\n kafka: {\n resultTypes: [\"regularMetrics\", \"composedMetrics\", \"loadChecks\", \"checks\"],\n connection: \"kafka_broker\"\n topic: \"some.topic\"\n }\n }\n errorCollection: {\n file: {\n metrics: [\"pct_of_null\", \"hive_table_row_cnt\", \"hive_table_nulls\"]\n dumpSize: 50\n save: {\n kind: \"orc\"\n path: ${basePath}\"/errors/\"${referenceDate}\n }\n }\n kafka: {\n metrics: [\"hive_table_nulls\", \"fixed_file_dist_name\", \"table_source1_inn_regex\"]\n dumpSize: 25\n connection: \"kafka_broker\"\n topic: \"some.topic\"\n options: [\"addParam=true\"]\n }\n }\n summary: {\n email: {\n attachMetricErrors: true\n metrics: [\"hive_table_nulls\", \"fixed_file_dist_name\", \"table_source1_inn_regex\"]\n dumpSize: 10\n recipients: [\"some.person@some.domain\"]\n }\n mattermost: {\n attachMetricErrors: true\n metrics: [\"hive_table_nulls\", \"fixed_file_dist_name\", \"table_source1_inn_regex\"]\n dumpSize: 10\n recipients: [\"@someUser\", \"#someChannel\"]\n }\n kafka: {\n connection: \"kafka_broker\"\n topic: \"dev.dq_results.topic\"\n }\n }\n checkAlerts: {\n email: [\n {\n id: \"alert1\"\n checks: [\"avg_bal_check\", \"zero_nulls\"]\n recipients: [\"some.peron@some.domain\"]\n }\n {\n id: \"alert2\"\n checks: [\"top2_curr_match\", \"completeness_check\"]\n recipients: [\"another.peron@some.domain\"]\n }\n ]\n mattermost: [\n {\n id: \"alert3\"\n checks: [\"avg_bal_check\", \"zero_nulls\"]\n recipients: [\"@someUser\"]\n }\n {\n id: \"alert4\"\n checks: [\"top2_curr_match\", \"completeness_check\"]\n recipients: [\"#someChannel\"]\n }\n ]\n }\n }\n}\n
"},{"location":"ru/","title":"Home","text":"\u0410\u043a\u0442\u0443\u0430\u043b\u044c\u043d\u0430\u044f \u0432\u0435\u0440\u0441\u0438\u044f: 1.4.0
\u0414\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u0430\u0446\u0438\u044f \u043d\u0430 \u0440\u0443\u0441\u0441\u043a\u043e\u043c \u044f\u0437\u044b\u043a\u0435 \u043d\u0430\u0445\u043e\u0434\u0438\u0442\u0441\u044f \u0432 \u0441\u0442\u0430\u0434\u0438\u0438 \u0440\u0430\u0437\u0440\u0430\u0431\u043e\u0442\u043a\u0438. \u041f\u043e\u0436\u0430\u043b\u0443\u0439\u0441\u0442\u0430, \u043f\u043e\u043b\u044c\u0437\u0443\u0439\u0442\u0435\u0441\u044c \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u0430\u0446\u0438\u0435\u0439 \u043d\u0430 \u0430\u043d\u0433\u043b\u0438\u0439\u0441\u043a\u043e\u043c.
\u0414\u043b\u044f \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0435\u043d\u0438\u044f \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0430 \u0431\u043e\u043b\u044c\u0448\u0438\u0445 \u0434\u0430\u043d\u043d\u044b\u0445, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0442\u044c \u0440\u0430\u0441\u0447\u0435\u0442\u044b \u0431\u043e\u043b\u044c\u0448\u043e\u0433\u043e \u043a\u043e\u043b\u0438\u0447\u0435\u0441\u0442\u0432\u0430 \u043c\u0435\u0442\u0440\u0438\u043a \u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043e\u043a \u043d\u0430\u0434 \u043e\u0433\u0440\u043e\u043c\u043d\u044b\u043c\u0438 \u0434\u0430\u0442\u0430\u0441\u0435\u0442\u0430\u043c\u0438, \u0447\u0442\u043e \u0432 \u0441\u0432\u043e\u044e \u043e\u0447\u0435\u0440\u0435\u0434\u044c \u044f\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u0441\u043b\u043e\u0436\u043d\u043e\u0439 \u0437\u0430\u0434\u0430\u0447\u0435\u0439.
Checkita - \u044d\u0442\u043e Data Quality \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0440\u0435\u0448\u0430\u0435\u0442 \u044d\u0442\u0443 \u0437\u0430\u0434\u0430\u0447\u0443, \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u044f \u0444\u043e\u0440\u043c\u0430\u043b\u0438\u0437\u043e\u0432\u0430\u0442\u044c \u0438 \u0443\u043f\u0440\u043e\u0441\u0442\u0438\u0442\u044c \u043f\u0440\u043e\u0446\u0435\u0441\u0441 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u0438 \u0447\u0442\u0435\u043d\u0438\u044f \u0434\u0430\u043d\u043d\u044b\u0445 \u0438\u0437 \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432, \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u044f \u043c\u0435\u0442\u0440\u0438\u043a \u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043e\u043a \u043d\u0430\u0434 \u0434\u0430\u043d\u043d\u044b\u043c \u0432 \u044d\u0442\u0438\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0430\u0445, \u0430 \u0442\u0430\u043a\u0436\u0435 \u043e\u0442\u043f\u0440\u0430\u0432\u043a\u0443 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0438 \u0443\u0432\u0435\u0434\u043e\u043c\u043b\u0435\u043d\u0438\u0439 \u043f\u043e \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u043c \u043a\u0430\u043d\u0430\u043b\u0430\u043c.
\u0418\u0442\u0430\u043a, Checkita \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0442\u044c \u0440\u0430\u0441\u0447\u0435\u0442 \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0445 \u043c\u0435\u0442\u0440\u0438\u043a \u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043e\u043a \u043d\u0430\u0434 \u0434\u0430\u043d\u043d\u044b\u043c\u0438 (\u043a\u0430\u043a \u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u043c\u0438, \u0442\u0430\u043a \u0438 \u043d\u0435\u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u043c\u0438). \u0424\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a \u0441\u043f\u043e\u0441\u043e\u0431\u0435\u043d \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0442\u044c \u0440\u0430\u0441\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u043d\u044b\u0435 \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044f \u043d\u0430\u0434 \u0434\u0430\u043d\u043d\u044b\u043c\u0438 \u0437\u0430 \"\u043e\u0434\u0438\u043d \u043f\u0440\u043e\u0445\u043e\u0434\", \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044f Spark \u0432 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0435 \u0432\u044b\u0447\u0438\u0441\u043b\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0433\u043e \u044f\u0434\u0440\u0430. \u041a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u043e\u043d\u043d\u044b\u0435 Hocon \u0444\u0430\u0439\u043b\u044b \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0442\u0441\u044f \u043a\u0430\u043a \u0434\u043b\u044f \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u044f \u043d\u0430\u0441\u0442\u0440\u043e\u0435\u043a \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u0438, \u0442\u0430\u043a \u0438 \u0434\u043b\u044f \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u044f \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430 \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044f \u043c\u0435\u0442\u0440\u0438\u043a \u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043e\u043a. \u0420\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b \u0440\u0430\u0441\u0447\u0435\u0442\u043e\u0432 \u0441\u043e\u0445\u0440\u0430\u043d\u044f\u044e\u0442\u0441\u044f \u0432 \u0432\u044b\u0434\u0435\u043b\u0435\u043d\u043d\u0443\u044e \u0431\u0430\u0437\u0443 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430, \u0430 \u0442\u0430\u043a\u0436\u0435 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u043e\u0442\u043f\u0440\u0430\u0432\u043b\u0435\u043d\u044b \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044f\u043c \u043f\u043e \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u043c \u043a\u0430\u043d\u0430\u043b\u0430\u043c: \u0444\u0430\u0439\u043b (\u043b\u043e\u043a\u0430\u043b\u044c\u043d\u0430\u044f FS, HDFS, S3), Email, Mattermost, Kafka.
\u0418\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0435 Spark \u0432 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0435 \u0432\u044b\u0447\u0438\u0441\u043b\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0433\u043e \u044f\u0434\u0440\u0430 \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0442\u044c \u0440\u0430\u0441\u0447\u0435\u0442\u044b \u043c\u0435\u0442\u0440\u0438\u043a \u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043e\u043a \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 \"\u0441\u044b\u0440\u044b\u0445\" \u0434\u0430\u043d\u043d\u044b\u0445, \u043d\u0435 \u0442\u0440\u0435\u0431\u0443\u044f \u043a\u0430\u043a\u0438\u0445-\u043b\u0438\u0431\u043e SQL \u0430\u0431\u0441\u0442\u0440\u0430\u043a\u0446\u0438\u0439 \u043d\u0430\u0434 \u0434\u0430\u043d\u043d\u044b\u043c\u0438 (\u0442\u0430\u043a\u0438\u0445, \u043a\u0430\u043a Hive \u0438\u043b\u0438 Impala), \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0432 \u0441\u0432\u043e\u044e \u043e\u0447\u0435\u0440\u0435\u0434\u044c \u043c\u043e\u0433\u0443\u0442 \u0441\u043a\u0440\u044b\u0432\u0430\u0442\u044c \u043d\u0435\u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043e\u0448\u0438\u0431\u043a\u0438 \u0432 \u0434\u0430\u043d\u043d\u044b\u0445 (\u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u043f\u043b\u043e\u0445\u043e\u0435 \u0444\u043e\u0440\u043c\u0430\u0442\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f \u0438\u043b\u0438 \u043d\u0435\u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0438\u044f \u0441\u0445\u0435\u043c\u044b).
Checkita \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0442\u044c \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0435\u0435:
Checkita \u0440\u0430\u0437\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u0441 \u0444\u043e\u043a\u0443\u0441\u043e\u043c \u043d\u0430 \u0438\u043d\u0442\u0435\u0433\u0440\u0430\u0446\u0438\u044e \u0432 ETL \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u044b \u0438 \u0441\u0438\u0441\u0442\u0435\u043c\u044b \u043a\u0430\u0442\u0430\u043b\u043e\u0433\u043e\u0432 \u0434\u0430\u043d\u043d\u044b\u0445:
\u0415\u0449\u0435 \u043e\u0434\u043d\u043e\u0439 \u043a\u043b\u044e\u0447\u0435\u0432\u043e\u0439 \u043e\u0441\u043e\u0431\u0435\u043d\u043d\u043e\u0441\u0442\u044c\u044e \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430 Checkita \u044f\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0442\u044c \u043a\u0430\u043a \u0441\u0442\u0430\u0442\u0438\u0447\u043d\u044b\u0435 (batch), \u0442\u0430\u043a \u0438 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0435 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0438 \u0434\u0430\u043d\u043d\u044b\u0445. \u0422\u0430\u043a, \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u0437\u0430\u043f\u0443\u0441\u043a \u0434\u0432\u0443\u0445 \u0442\u0438\u043f\u043e\u0432 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0439: \u0434\u043b\u044f \u043f\u0430\u043a\u0435\u0442\u043d\u043e\u0439 \u0438 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u043e\u0439 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432 \u0434\u0430\u043d\u043d\u044b\u0445. \u041f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0439 \u0440\u0435\u0436\u0438\u043c \u0440\u0430\u0431\u043e\u0442\u044b \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430 \u0432 \u0434\u0430\u043d\u043d\u044b\u0445 \u043c\u043e\u043c\u0435\u043d\u0442 \u043d\u0430\u0445\u043e\u0434\u0438\u0442\u0441\u044f \u0432 \u044d\u043a\u0441\u043f\u0435\u0440\u0438\u043c\u0435\u043d\u0442\u0430\u043b\u044c\u043d\u043e\u0439 \u0441\u0442\u0430\u0434\u0438\u0438, \u0438 \u043f\u043e\u044d\u0442\u043e\u043c\u0443 \u0432 \u044d\u0442\u043e\u0439 \u0447\u0430\u0441\u0442\u0438 \u0440\u0430\u0431\u043e\u0442\u044b \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430 \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u044b \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u0438\u044f.
\u0424\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a \u043d\u0430\u043f\u0438\u0441\u0430\u043d \u043d\u0430 Scala 2.12 \u0438 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442 Spark 2.4+ \u0432 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0435 \u0432\u044b\u0447\u0438\u0441\u043b\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0433\u043e \u044f\u0434\u0440\u0430. \u0412 \u043f\u0440\u043e\u0435\u043a\u0442\u0435 \u043d\u0430\u0441\u0442\u0440\u043e\u0435\u043d\u0430 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u0438\u0437\u0443\u0435\u043c\u0430\u044f \u0441\u0431\u043e\u0440\u043a\u0430, \u043a\u043e\u0442\u043e\u0440\u0430\u044f \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u0441\u043e\u0431\u0438\u0440\u0430\u0442\u044c \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a \u043f\u043e\u0434 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u043d\u0443\u044e \u0432\u0435\u0440\u0441\u0438\u044e Spark, \u043f\u0443\u0431\u043b\u0438\u043a\u043e\u0432\u0430\u0442\u044c \u043f\u0440\u043e\u0435\u043a\u0442 \u0432 \u0437\u0430\u0434\u0430\u043d\u043d\u044b\u0439 \u0440\u0435\u043f\u043e\u0437\u0438\u0442\u043e\u0440\u0438\u0439, \u0430 \u0442\u0430\u043a\u0436\u0435 \u0441\u043e\u0431\u0438\u0440\u0430\u0442\u044c Uber-jar, \u043a\u0430\u043a \u0441 \u0437\u0430\u0432\u0438\u0441\u0438\u043c\u043e\u0441\u0442\u044f\u043c\u0438 Spark, \u0442\u0430\u043a \u0438 \u0431\u0435\u0437 \u043d\u0438\u0445.
\u041b\u0438\u0446\u0435\u043d\u0437\u0438\u044f
\u0424\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a Checkita \u0440\u0430\u0441\u043f\u0440\u043e\u0441\u0442\u0440\u0430\u043d\u044f\u0435\u0442\u0441\u044f \u043f\u043e\u0434 \u043b\u0438\u0446\u0435\u043d\u0437\u0438\u0435\u0439 GNU LGPL.
\u0414\u0430\u043d\u043d\u044b\u0439 \u043f\u0440\u043e\u0435\u043a\u0442 - \u044d\u0442\u043e \u043f\u0435\u0440\u0435\u043e\u0441\u043c\u044b\u0441\u043b\u0435\u043d\u0438\u0435 Data Quality \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430 \u0440\u0430\u0437\u0440\u0430\u0431\u043e\u0442\u0430\u043d\u043d\u043e\u0433\u043e \u043a\u043e\u043c\u043f\u0430\u043d\u0438\u0435\u0439 Agile Lab, \u0418\u0442\u0430\u043b\u0438\u044f.
"},{"location":"ru/01-application-setup/","title":"\u041e\u0431\u0449\u0430\u044f \u0418\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u044f","text":"Checkita \u0440\u0430\u0431\u043e\u0442\u0430\u0435\u0442 \u043a\u0430\u043a Spark-\u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435. \u0421\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0435\u043d\u043d\u043e, \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u0437\u0430\u043f\u0443\u0449\u0435\u043d\u043e \u0442\u0430\u043a\u0438\u043c \u0436\u0435 \u043e\u0431\u0440\u0430\u0437\u043e\u043c, \u043a\u0430\u043a \u0438 \u043b\u044e\u0431\u043e\u0435 \u0434\u0440\u0443\u0433\u043e\u0435 Spark-\u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435:
\u0422\u0430\u043a\u0436\u0435 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u044e\u0442\u0441\u044f \u043e\u0431\u0430 \u0440\u0435\u0436\u0438\u043c\u0430 \u0437\u0430\u043f\u0443\u0441\u043a\u0430 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f: client
and cluster
.
\u0424\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a \u0440\u0430\u0437\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u0432 \u043f\u0435\u0440\u0432\u0443\u044e \u043e\u0447\u0435\u0440\u0435\u0434\u044c \u0434\u043b\u044f \u043f\u0430\u043a\u0435\u0442\u043d\u043e\u0439 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0438 \u0434\u0430\u043d\u043d\u044b\u0445 \u0438 \u043d\u0430 \u0434\u0430\u043d\u043d\u044b\u0439 \u043c\u043e\u043c\u0435\u043d\u0442 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442 \u0442\u043e\u043b\u044c\u043a\u043e \u0442\u0430\u043a\u043e\u0439 \u0440\u0435\u0436\u0438\u043c \u0440\u0430\u0431\u043e\u0442\u044b. \u0422\u0438\u043f\u043e\u0432\u0430\u044f \u0430\u0440\u0445\u0438\u0442\u0435\u043a\u0442\u0443\u0440\u0430 \u0434\u043b\u044f \u0440\u0430\u0431\u043e\u0442\u044b \u0441 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u043e\u043c \u043f\u043e\u043a\u0430\u0437\u0430\u043d\u0430 \u043d\u0430 \u0441\u0445\u0435\u043c\u0435 \u043d\u0438\u0436\u0435:
\u0422\u0430\u043a\u0436\u0435, Data Quality Framework \u043c\u043e\u0436\u0435\u0442 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c\u0441\u044f \u0438 \u0434\u043b\u044f \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u043e\u0439 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0438 \u0434\u0430\u043d\u043d\u044b\u0445, \u043e\u0434\u043d\u0430\u043a\u043e \u0434\u0430\u043d\u043d\u044b\u0439 \u0444\u0443\u043d\u043a\u0446\u0438\u043e\u043d\u0430\u043b \u043d\u0430\u0445\u043e\u0434\u0438\u0442\u0441\u044f \u0432 \u044d\u043a\u0441\u043f\u0435\u0440\u0438\u043c\u0435\u043d\u0442\u0430\u043b\u044c\u043d\u043e\u0439 \u0441\u0442\u0430\u0434\u0438\u0438. \u0414\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043e \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0435 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432 \u0434\u0430\u043d\u043d\u044b\u0445 \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u041f\u0440\u043e\u0432\u0435\u0440\u043a\u0430 \u041a\u0430\u0447\u0435\u0441\u0442\u0432\u0430 \u041f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0445 \u0418\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432 \u0414\u0430\u043d\u043d\u044b\u0445.
"},{"location":"ru/01-application-setup/01-ApplicationSettings/","title":"\u041d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0438 \u041f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f","text":"\u041e\u0431\u0449\u0438\u0435 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f Checkita Data Quality \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0438\u0440\u0443\u044e\u0442\u0441\u044f \u0432 Hocon \u0444\u0430\u0439\u043b\u0435 application.conf
, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u043f\u0435\u0440\u0435\u0434\u0430\u0435\u0442\u0441\u044f \u0432 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0435\u0433\u043e \u0441\u0442\u0430\u0440\u0442\u0430. \u0412\u0441\u0435 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0438 \u0437\u0430\u0434\u0430\u044e\u0442\u0441\u044f \u0432\u043d\u0443\u0442\u0440\u0438 \u0441\u0435\u043a\u0446\u0438\u0438 appConfig
.
\u0415\u0434\u0438\u043d\u0441\u0442\u0432\u0435\u043d\u043d\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0437\u0430\u0434\u0430\u0435\u0442\u0441\u044f \u043d\u0430 \u0432\u0435\u0440\u0445\u043d\u0435\u043c \u0443\u0440\u043e\u0432\u043d\u0435 - \u044d\u0442\u043e applicationName
: \u0438\u043c\u044f Spark \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f. \u042d\u0442\u043e \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440 \u043e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u044b\u0439, \u0438 \u0435\u0441\u043b\u0438 \u043e\u043d \u043d\u0435 \u0437\u0430\u0434\u0430\u043d, \u0442\u043e \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u0431\u0443\u0434\u0435\u0442 \u0437\u0430\u043f\u0443\u0449\u0435\u043d\u043e \u0441 \u0438\u043c\u0435\u043d\u0435\u043c Checkita Data Quality
\u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e.
\u041e\u0441\u0442\u0430\u043b\u044c\u043d\u044b\u0435 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u0437\u0430\u0434\u0430\u044e\u0442\u0441\u044f \u0432 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0443\u044e\u0449\u0438\u0445 \u043f\u043e\u0434-\u0441\u0435\u043a\u0446\u0438\u044f\u0445, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043e\u043f\u0438\u0441\u0430\u043d\u044b \u043d\u0438\u0436\u0435:
"},{"location":"ru/01-application-setup/01-ApplicationSettings/#_2","title":"\u041d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0438 \u0434\u0430\u0442\u044b \u0438 \u0432\u0440\u0435\u043c\u0435\u043d\u0438","text":"\u041d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0438 \u0434\u0430\u0442\u044b \u0438 \u0432\u0440\u0435\u043c\u0435\u043d\u0438 \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u0432 \u0441\u0435\u043a\u0446\u0438\u0438 dateTimeOptions
. \u0414\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043e \u0440\u0430\u0431\u043e\u0442\u0435 \u0441 \u0434\u0430\u0442\u0430\u043c\u0438 \u0432 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0435 Checkita Data Quality, \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u0420\u0430\u0431\u043e\u0442\u0430 \u0441 \u0414\u0430\u0442\u0430\u043c\u0438.
\u041f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b \u0434\u043b\u044f \u0440\u0430\u0431\u043e\u0442\u044b \u0441 \u0434\u0430\u0442\u0430\u043c\u0438 \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0435:
timeZone
- \u0412\u0440\u0435\u043c\u0435\u043d\u043d\u0430\u044f \u0437\u043e\u043d\u0430, \u0432 \u043a\u043e\u0442\u043e\u0440\u043e\u0439 \u0443\u043a\u0430\u0437\u0430\u043d\u043e \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u0435 \u0434\u0430\u0442\u044b. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e \"UTC\"
.referenceDateFormat
- \u0444\u043e\u0440\u043c\u0430\u0442 \u0434\u0430\u0442\u044b/\u0432\u0440\u0435\u043c\u0435\u043d\u0438, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0433\u043e \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u0440\u0435\u0444\u0435\u0440\u0435\u043d\u0442\u043d\u043e\u0439 \u0434\u0430\u0442\u044b. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e \"yyyy-MM-dd'T'HH:mm:ss.SSS\"
.executionDateFormat
- \u0444\u043e\u0440\u043c\u0430\u0442 \u0434\u0430\u0442\u044b/\u0432\u0440\u0435\u043c\u0435\u043d\u0438, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0433\u043e \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u0434\u0430\u0442\u044b \u0441\u0442\u0430\u0440\u0442\u0430 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e \"yyyy-MM-dd'T'HH:mm:ss.SSS\"
\u0415\u0441\u043b\u0438 \u0441\u0435\u043a\u0446\u0438\u044f dateTimeOptions
\u043d\u0435 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u0430, \u0442\u043e \u0431\u0443\u0434\u0443\u0442 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c\u0441\u044f \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u044f \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e \u0434\u043b\u044f \u0432\u0441\u0435\u0445 \u0432\u044b\u0448\u0435\u0443\u043a\u0430\u0437\u0430\u043d\u043d\u044b\u0445 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432.
\u0414\u0430\u043d\u043d\u044b\u0439 \u043d\u0430\u0431\u043e\u0440 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0439 \u043f\u0440\u0438\u043c\u0435\u043d\u0438\u043c \u0442\u043e\u043b\u044c\u043a\u043e \u0434\u043b\u044f \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0445 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0439 \u0438 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u0442 \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0435 \u0430\u0441\u043f\u0435\u043a\u0442\u044b \u0432\u044b\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f \u043f\u0440\u043e\u0432\u0435\u0440\u043e\u043a \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0430 \u043d\u0430\u0434 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u043c\u0438 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0430\u043c\u0438. \u0414\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043e \u0437\u0430\u043f\u0443\u0441\u043a\u0435 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430 \u0432 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u043e\u043c \u0440\u0435\u0436\u0438\u043c\u0435, \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u041f\u0440\u043e\u0432\u0435\u0440\u043a\u0430 \u041a\u0430\u0447\u0435\u0441\u0442\u0432\u0430 \u041f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0445 \u0418\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432 \u0414\u0430\u043d\u043d\u044b\u0445.
trigger
- \u0422\u0440\u0438\u0433\u0433\u0435\u0440\u043d\u044b\u0439 \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b: \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u0442 \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0439 \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b \u0437\u0430 \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0431\u0443\u0434\u0443\u0442 \u0441\u043e\u0431\u0438\u0440\u0430\u0442\u044c\u0441\u044f \u043c\u0438\u043a\u0440\u043e-\u043f\u0430\u043a\u0435\u0442\u044b \u0438\u0437 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u043e\u0433\u043e \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0430 \u0434\u0430\u043d\u043d\u044b\u0445. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e 10s
.window
- \u041e\u043a\u043e\u043d\u043d\u044b\u0439 \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b: \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u0442 \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0439 \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b, \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0432\u0443\u044e\u0449\u0438\u0439 \u0440\u0430\u0437\u043c\u0435\u0440\u0443 \u043e\u043a\u043d\u0430, \u0432 \u043a\u043e\u0442\u043e\u0440\u043e\u043c \u0431\u0443\u0434\u0443\u0442 \u0430\u043a\u043a\u0443\u043c\u0443\u043b\u0438\u0440\u043e\u0432\u0430\u043d\u044b \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b \u0440\u0430\u0441\u0447\u0435\u0442\u0430 \u043c\u0435\u0442\u0440\u0438\u043a. \u0412\u0441\u0435 \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438 \u0432\u044b\u0447\u0438\u0441\u043b\u044f\u044e\u0442\u0441\u044f \u0434\u043b\u044f \u043a\u0430\u0436\u0434\u043e\u0433\u043e \u043e\u043a\u043d\u0430, \u043a\u0430\u043a \u0442\u043e\u043b\u044c\u043a\u043e \u043e\u043d\u043e \u0431\u0443\u0434\u0435\u0442 \u043f\u043e\u043b\u043d\u043e\u0441\u0442\u044c\u044e \u0441\u0444\u043e\u0440\u043c\u0438\u0440\u043e\u0432\u0430\u043d\u043e (\u043e\u043f\u0443\u0441\u0442\u0438\u0442\u0441\u044f \u043d\u0438\u0436\u0435 \u0443\u0440\u043e\u0432\u043d\u044f \"\u0432\u043e\u0434\u044f\u043d\u043e\u0433\u043e \u0437\u043d\u0430\u043a\u0430\"). \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e 10m
.watermark
- \u0423\u0440\u043e\u0432\u0435\u043d\u044c \"\u0432\u043e\u0434\u044f\u043d\u043e\u0433\u043e \u0437\u043d\u0430\u043a\u0430\": \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u0442 \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0439 \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b \u043f\u043e\u0441\u043b\u0435 \u043a\u043e\u0442\u043e\u0440\u043e\u0433\u043e, \"\u043e\u043f\u043e\u0437\u0434\u0430\u0432\u0448\u0438\u0435\" \u0437\u0430\u043f\u0438\u0441\u0438 \u043d\u0435 \u0431\u0443\u0434\u0443\u0442 \u0431\u0440\u0430\u0442\u044c\u0441\u044f \u0432 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0443. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e 5m
.allowEmptyWindows
- \u0411\u0438\u043d\u0430\u0440\u043d\u044b\u0439 \u0444\u043b\u0430\u0433, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u044e\u0449\u0438\u0439, \u0440\u0430\u0437\u0440\u0435\u0448\u0435\u043d\u044b \u043b\u0438 \"\u043f\u0443\u0441\u0442\u044b\u0435\" \u043e\u043a\u043d\u0430 (\u043e\u043a\u043d\u0430 \u0432 \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043d\u0435 \u043f\u043e\u043f\u0430\u043b\u043e \u043d\u0438 \u043e\u0434\u043d\u043e\u0439 \u0441\u0442\u0440\u043e\u043a\u0438 \u0434\u0430\u043d\u043d\u044b\u0445). \u0422\u0430\u043a, \u0432 \u0441\u0438\u0442\u0443\u0430\u0446\u0438\u044f\u0445, \u043a\u043e\u0433\u0434\u0430 \u043e\u043a\u043d\u043e \u043f\u043e\u043b\u043d\u043e\u0441\u0442\u044c\u044e \u0441\u0444\u043e\u0440\u043c\u0438\u0440\u043e\u0432\u0430\u043d\u043e \u0438 \u043f\u0440\u0438 \u044d\u0442\u043e\u043c \u0434\u043b\u044f \u043e\u0434\u043d\u043e\u0433\u043e \u0438\u043b\u0438 \u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u0438\u0445 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432 \u0432 \u044d\u0442\u043e \u043e\u043a\u043d\u043e \u043d\u0435 \u043f\u043e\u043f\u0430\u043b\u043e \u043d\u0438 \u043e\u0434\u043d\u043e\u0439 \u0441\u0442\u0440\u043e\u043a\u0438, \u0442\u043e \u0432\u0441\u0435 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0443\u044e\u0449\u0438\u0435 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438 \u0431\u0443\u0434\u0443\u0442 \u043e\u043f\u0443\u0449\u0435\u043d\u044b \u0442\u043e\u043b\u044c\u043a\u043e \u0432 \u0442\u043e\u043c \u0441\u043b\u0443\u0447\u0430\u0435, \u043a\u043e\u0433\u0434\u0430 \u0434\u0430\u043d\u043d\u044b\u0445 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440 \u0431\u0443\u0434\u0435\u0442 \u0438\u043c\u0435\u0442\u044c \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435 true
. \u0412 \u043f\u0440\u043e\u0442\u0438\u0432\u043d\u043e\u043c \u0441\u043b\u0443\u0447\u0430\u0435, \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438 \u0431\u0443\u0434\u0443\u0442 \u0432\u044b\u043f\u043e\u043b\u043d\u0435\u043d\u044b \u0438 \u0432\u0435\u0440\u043d\u0443\u0442 \u043e\u0448\u0438\u0431\u043a\u0443 \u0441 \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u0435\u043c \u0432\u0438\u0434\u0430 ... metric results were not found ...
. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e false
.\u0412\u0410\u0416\u041d\u041e \u0412\u0441\u0435 \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b\u044b \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u044b \u043a\u0430\u043a \u0441\u0442\u0440\u043e\u043a\u0438 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0443\u044e\u0449\u0435\u043c \u0444\u043e\u0440\u043c\u0430\u0442\u0443 Scala Duration.
"},{"location":"ru/01-application-setup/01-ApplicationSettings/#_4","title":"\u0410\u043a\u0442\u0438\u0432\u0430\u0442\u043e\u0440\u044b","text":"\u0421\u0435\u043a\u0446\u0438\u044f enablers
\u0432 \u0444\u0430\u0439\u043b\u0435 \u0441 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u0442 \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0435 \u0431\u0438\u043d\u0430\u0440\u043d\u044b\u0435 \u0430\u043a\u0442\u0438\u0432\u0430\u0442\u043e\u0440\u044b \u0438\u043b\u0438 \u0447\u0438\u0441\u043b\u043e\u0432\u044b\u0435 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043a\u043e\u043d\u0442\u0440\u043e\u043b\u0438\u0440\u0443\u044e\u0442 \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0435 \u0430\u0441\u043f\u0435\u043a\u0442\u044b \u0438\u0441\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f data quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430:
allowSqlQueries
- \u0411\u0438\u043d\u0430\u0440\u043d\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u043a\u043e\u043d\u0442\u0440\u043e\u043b\u0438\u0440\u0443\u0435\u0442 \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u044f \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0445 SQL \u0437\u0430\u043f\u0440\u043e\u0441\u043e\u0432 \u043f\u0440\u0438 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u0438 \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e false
allowNotifications
- \u0411\u0438\u043d\u0430\u0440\u043d\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u043a\u043e\u043d\u0442\u0440\u043e\u043b\u0438\u0440\u0443\u0435\u0442 \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u043e\u0442\u043f\u0440\u0430\u0432\u043a\u0438 \u0443\u0432\u0435\u0434\u043e\u043c\u043b\u0435\u043d\u0438\u0439 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044f\u043c. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e false
aggregatedKafkaOutput
- \u0411\u0438\u043d\u0430\u0440\u043d\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u043e\u0442\u043f\u0440\u0430\u0432\u043b\u044f\u0442\u044c \u0430\u0433\u0440\u0435\u0433\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u0435 \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u044f \u0434\u043b\u044f Kafka \u0442\u0430\u0440\u0433\u0435\u0442\u043e\u0432 (\u043f\u043e \u043e\u0434\u043d\u043e\u043c\u0443 \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u044e \u043d\u0430 \u043a\u0430\u0436\u0434\u044b\u0439 \u0442\u0438\u043f \u0442\u0430\u0440\u0433\u0435\u0442\u0430). \u041f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e \u043e\u0442\u043f\u0440\u0430\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u043e\u0442\u0434\u0435\u043b\u044c\u043d\u043e\u0435 \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u0435 \u0432 Kafka \u0434\u043b\u044f \u043a\u0430\u0436\u0434\u043e\u0439 \u0441\u0443\u0449\u043d\u043e\u0441\u0442\u0438. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e false
enableCaseSensitivity
- \u0411\u0438\u043d\u0430\u0440\u043d\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0443\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442 \u0447\u0443\u0432\u0441\u0442\u0432\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0441\u0442\u044c \u043a \u0440\u0435\u0433\u0438\u0441\u0442\u0440\u0443 \u0432 \u0438\u043c\u0435\u043d\u0430\u0445 \u043a\u043e\u043b\u043e\u043d\u043e\u043a. \u0422\u0430\u043a\u0438\u043c \u043e\u0431\u0440\u0430\u0437\u043e\u043c \u043a\u043e\u043d\u0442\u0440\u043e\u043b\u0438\u0440\u0443\u0435\u0442\u0441\u044f, \u043a\u0430\u043a \u0438\u043c\u0435\u043d\u0430 \u043a\u043e\u043b\u043e\u043d\u043e\u043a \u0431\u0443\u0434\u0443\u0442 \u0441\u0440\u0430\u0432\u043d\u0438\u0432\u0430\u0442\u044c\u0441\u044f \u043c\u0435\u0436\u0434\u0443 \u0441\u043e\u0431\u043e\u0439 \u0438 \u043a\u0430\u043a \u0431\u0443\u0434\u0435\u0442 \u043f\u0440\u043e\u0438\u0441\u0445\u043e\u0434\u0438\u0442\u044c \u0438\u0445 \u043f\u043e\u0438\u0441\u043a \u0432 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0435. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e false
errorDumpSize
- \u041c\u0430\u043a\u0441\u0438\u043c\u0430\u043b\u044c\u043d\u043e\u0435 \u043a\u043e\u043b\u0438\u0447\u0435\u0441\u0442\u0432\u043e \u043e\u0448\u0438\u0431\u043e\u043a, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0441\u043e\u0431\u0440\u0430\u043d\u044b \u0434\u043b\u044f \u043e\u0442\u0434\u0435\u043b\u044c\u043d\u043e\u0439 \u043c\u0435\u0442\u0440\u0438\u043a\u0438. \u0424\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a \u0438\u043c\u0435\u0435\u0442 \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u0441\u043e\u0431\u0438\u0440\u0430\u0442\u044c \u0434\u0430\u043d\u043d\u044b\u0435 \u0438\u0437 \u0441\u0442\u0440\u043e\u043a\u0438 \u0432 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0435, \u0434\u043b\u044f \u043a\u043e\u0442\u043e\u0440\u043e\u0439 \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u0435 \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0437\u0430\u0432\u0435\u0440\u0448\u0438\u043b\u043e\u0441\u044c \u0441 \u043e\u0448\u0438\u0431\u043a\u043e\u0439. \u041e\u0434\u043d\u0430\u043a\u043e, \u0434\u043b\u044f \u0442\u043e\u0433\u043e, \u0447\u0442\u043e\u0431\u044b \u043f\u0440\u0435\u0434\u043e\u0442\u0432\u0440\u0430\u0442\u0438\u0442\u044c OOM, \u043c\u0430\u043a\u0441\u0438\u043c\u0430\u043b\u044c\u043d\u043e\u0435 \u043a\u043e\u043b\u0438\u0447\u0435\u0441\u0442\u0432\u043e \u043e\u0448\u0438\u0431\u043e\u043a \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0438\u0442\u044c \u0432 \u0440\u0430\u0437\u0443\u043c\u043d\u044b\u0445 \u043f\u0440\u0435\u0434\u0435\u043b\u0430\u0445. \u0422\u0430\u043a, \u043c\u0430\u043a\u0441\u0438\u043c\u0430\u043b\u044c\u043d\u043e \u0434\u043e\u043f\u0443\u0441\u0442\u0438\u043c\u043e\u0435 \u043a\u043e\u043b\u0438\u0447\u0435\u0441\u0442\u0432\u043e \u043e\u0448\u0438\u0431\u043e\u043a \u043d\u0430 \u043c\u0435\u0442\u0440\u0438\u043a\u0443 \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0435\u043d\u043e \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 10000
. \u041d\u043e \u0435\u0433\u043e \u0442\u0430\u043a\u0436\u0435 \u043c\u043e\u0436\u043d\u043e \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u043e \u0441\u043d\u0438\u0437\u0438\u0442\u044c \u0437\u0430\u0434\u0430\u0432 \u044d\u0442\u043e\u0442 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e 10000
outputRepartition
- \u0423\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442 \u043a\u043e\u043b\u0438\u0447\u0435\u0441\u0442\u0432\u043e \u043f\u0430\u0440\u0442\u0438\u0446\u0438\u0439 \u043f\u0440\u0438 \u0437\u0430\u043f\u0438\u0441\u0438 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0432 \u0444\u0430\u0439\u043b\u044b. \u041f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e \u0437\u0430\u043f\u0438\u0441\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u043e\u0434\u0438\u043d \u0444\u0430\u0439\u043b. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e 1
\u0415\u0441\u043b\u0438 \u0441\u0435\u043a\u0446\u0438\u044f enablers
\u043d\u0435 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u0430, \u0442\u043e \u0431\u0443\u0434\u0443\u0442 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c\u0441\u044f \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u044f \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e \u0434\u043b\u044f \u0432\u0441\u0435\u0445 \u0432\u044b\u0448\u0435\u0443\u043a\u0430\u0437\u0430\u043d\u043d\u044b\u0445 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432.
\u041f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0431\u0430\u0437\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 \u0441 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u0430\u043c\u0438 \u0440\u0430\u0431\u043e\u0442\u044b Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u043e\u0432 \u0437\u0430\u0434\u0430\u044e\u0442\u0441\u044f \u0432 \u0441\u0435\u043a\u0446\u0438\u0438 storage
\u0444\u0430\u0439\u043b\u0430 \u0441 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f.
\u0414\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043e \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u0425\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0420\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432.
\u0422\u0430\u043a\u0438\u043c \u043e\u0431\u0440\u0430\u0437\u043e\u043c, \u0434\u043b\u044f \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0438 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0431\u0430\u0437\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 \u0441 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u0430\u043c\u0438, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0437\u0430\u0434\u0430\u0442\u044c \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0435 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b:
dbType
- \u0422\u0438\u043f \u0431\u0430\u0437\u044b \u0434\u0430\u043d\u043d\u044b\u0445, \u043a\u043e\u0442\u043e\u0440\u0430\u044f \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432. \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e.url
- URL \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0431\u0430\u0437\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 (\u0431\u0435\u0437 \u0443\u043a\u0430\u0437\u0430\u043d\u0438\u044f \u043f\u0440\u043e\u0442\u043e\u043a\u043e\u043b\u043e\u0432). \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e.username
- \u0418\u043c\u044f \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044f \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0431\u0430\u0437\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f). \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e.password
- \u041f\u0430\u0440\u043e\u043b\u044c \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0431\u0430\u0437\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f). \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e.schema
- \u0421\u0445\u0435\u043c\u0430 \u0432 \u043a\u043e\u0442\u043e\u0440\u043e\u0439 \u043d\u0430\u0445\u043e\u0434\u044f\u0442\u0441\u044f \u0442\u0430\u0431\u043b\u0438\u0446\u044b \u0441 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u0430\u043c\u0438 Data Quality (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f). \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e.\u0412\u0410\u0416\u041d\u041e \u0415\u0441\u043b\u0438 \u0441\u0435\u043a\u0446\u0438\u044f storage
\u043d\u0435 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u0430, \u0442\u043e \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u0431\u0443\u0434\u0435\u0442 \u0437\u0430\u043f\u0443\u0449\u0435\u043d\u043e \u0431\u0435\u0437 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0431\u0430\u0437\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 \u0441 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u0430\u043c\u0438:
\u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u043e\u0442\u043f\u0440\u0430\u0432\u043b\u044f\u0442\u044c \u0443\u0432\u0435\u0434\u043e\u043c\u043b\u0435\u043d\u0438\u044f \u043d\u0430 Email, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u043d\u0430\u0441\u0442\u0440\u043e\u0438\u0442\u044c \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0435 \u043a SMTP \u0441\u0435\u0440\u0432\u0435\u0440\u0443, \u043a\u043e\u0442\u043e\u0440\u043e\u0435 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0438\u0440\u0443\u0435\u0442\u0441\u044f \u0432 \u0441\u0435\u043a\u0446\u0438\u0438 email
\u0444\u0430\u0439\u043b\u0430 \u0441 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u0441\u043e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u043c\u0438 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u0430\u043c\u0438:
host
- \u0410\u0434\u0440\u0435\u0441 SMTP \u0441\u0435\u0440\u0432\u0435\u0440\u0430. \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e.port
- \u041f\u043e\u0440\u0442 \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a SMTP \u0441\u0435\u0440\u0432\u0435\u0440\u0443. \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e.address
- \u0410\u0434\u0440\u0435\u0441 \u043e\u0442\u043f\u0440\u0430\u0432\u0438\u0442\u0435\u043b\u044f. \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e.name
- \u0418\u043c\u044f \u043e\u0442\u043f\u0440\u0430\u0432\u0438\u0442\u0435\u043b\u044f. \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e.sslOnConnect
- \u0411\u0438\u043d\u0430\u0440\u043d\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440, \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u044e\u0449\u0438\u0439 \u043d\u0430 \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e\u0441\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u044f SSL \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e false
.tlsEnabled
- \u0411\u0438\u043d\u0430\u0440\u043d\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440, \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u044e\u0449\u0438\u0439 \u043d\u0430 \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e\u0441\u0442\u044c \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u0438 TLS \u043f\u0440\u0438 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0438. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e false
.username
- \u0418\u043c\u044f \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044f \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a SMTP \u0441\u0435\u0440\u0432\u0435\u0440\u0443 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f). \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e.password
- \u041f\u0430\u0440\u043e\u043b\u044c \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a SMTP \u0441\u0435\u0440\u0432\u0435\u0440\u0443 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f). \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e.\u0415\u0441\u043b\u0438 \u0441\u0435\u043a\u0446\u0438\u044f email
\u043d\u0435 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u0430, \u0442\u043e \u0443\u0432\u0435\u0434\u043e\u043c\u043b\u0435\u043d\u0438\u044f \u043d\u0430 Email \u043d\u0435 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u043e\u0442\u043f\u0440\u0430\u0432\u043b\u0435\u043d\u044b. \u041f\u0440\u0438 \u044d\u0442\u043e\u043c, \u0435\u0441\u043b\u0438 \u0442\u0430\u043a\u0438\u0435 \u0443\u0432\u0435\u0434\u043e\u043c\u043b\u0435\u043d\u0438\u044f \u0431\u044b\u043b\u0438 \u0441\u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u044b \u0432 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u0445 \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430, \u0442\u043e \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0438\u0441\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f \u0431\u0443\u0434\u0435\u0442 \u0431\u0440\u043e\u0448\u0435\u043d\u0430 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0432\u0443\u044e\u0449\u0430\u044f \u043e\u0448\u0438\u0431\u043a\u0430.
\u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u043e\u0442\u043f\u0440\u0430\u0432\u043b\u044f\u0442\u044c \u0443\u0432\u0435\u0434\u043e\u043c\u043b\u0435\u043d\u0438\u044f \u0432 Mattermost, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u043d\u0430\u0441\u0442\u0440\u043e\u0438\u0442\u044c \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0435 \u043a Mattermost API, \u043a\u043e\u0442\u043e\u0440\u043e\u0435 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0438\u0440\u0443\u0435\u0442\u0441\u044f \u0432 \u0441\u0435\u043a\u0446\u0438\u0438 mattermost
\u0444\u0430\u0439\u043b\u0430 \u0441 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u0441\u043e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u043c\u0438 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u0430\u043c\u0438:
host
- \u0430\u0434\u0440\u0435\u0441 \u043f\u043e \u043a\u043e\u0442\u043e\u0440\u043e\u043c\u0443 \u0434\u043e\u0441\u0442\u0443\u043f\u0435\u043d Mattermost API.token
- \u0422\u043e\u043a\u0435\u043d \u0434\u043b\u044f \u0434\u043e\u0441\u0442\u0443\u043f\u0430 \u043a Mattermost API (\u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0435 \u0431\u043e\u0442\u043e\u0432 \u043f\u0440\u0435\u0434\u043f\u043e\u0447\u0442\u0438\u0442\u0435\u043b\u044c\u043d\u043e \u0434\u043b\u044f \u043e\u0442\u043f\u0440\u0430\u0432\u043a\u0438 \u0443\u0432\u0435\u0434\u043e\u043c\u043b\u0435\u043d\u0438\u0439).\u0415\u0441\u043b\u0438 \u0441\u0435\u043a\u0446\u0438\u044f mattermost
\u043d\u0435 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u0430, \u0442\u043e \u0443\u0432\u0435\u0434\u043e\u043c\u043b\u0435\u043d\u0438\u044f \u0432 Mattermost \u043d\u0435 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u043e\u0442\u043f\u0440\u0430\u0432\u043b\u0435\u043d\u044b. \u041f\u0440\u0438 \u044d\u0442\u043e\u043c, \u0435\u0441\u043b\u0438 \u0442\u0430\u043a\u0438\u0435 \u0443\u0432\u0435\u0434\u043e\u043c\u043b\u0435\u043d\u0438\u044f \u0431\u044b\u043b\u0438 \u0441\u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u044b \u0432 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u0445 \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430, \u0442\u043e \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0438\u0441\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f \u0431\u0443\u0434\u0435\u0442 \u0431\u0440\u043e\u0448\u0435\u043d\u0430 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0432\u0443\u044e\u0449\u0430\u044f \u043e\u0448\u0438\u0431\u043a\u0430.
\u0412 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u0445 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u0442\u0430\u043a\u0436\u0435 \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e \u0443\u043a\u0430\u0437\u0430\u0442\u044c \u043d\u0430\u0431\u043e\u0440 Spark-\u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u044f\u0432\u043b\u044f\u044e\u0442\u0441\u044f \u043e\u0431\u0449\u0438\u043c\u0438 \u0434\u043b\u044f \u0431\u043e\u043b\u044c\u0448\u0438\u043d\u0441\u0442\u0432\u0430 \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u0435\u043c\u044b\u0445 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u043e\u0432. \u0422\u0430\u043a\u0438\u0435 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u0443\u043a\u0430\u0437\u0430\u043d\u044b \u0432 \u0441\u043f\u0438\u0441\u043a\u0435 defaultSparkOptions
\u043a\u0430\u043a \u0441\u0442\u0440\u043e\u043a\u0438 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435 spark.param.name=spark.param.value
.
Hocon \u0444\u043e\u0440\u043c\u0430\u0442 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442 \u043f\u043e\u0434\u0441\u0442\u0430\u043d\u043e\u0432\u043a\u0443 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445, \u0430 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a Checkita Data Quality, \u0432 \u0441\u0432\u043e\u044e \u043e\u0447\u0435\u0440\u0435\u0434\u044c, \u0438\u043c\u0435\u0435\u0442 \u043c\u0435\u0445\u0430\u043d\u0438\u0437\u043c \u0434\u043b\u044f \u0442\u043e\u0433\u043e, \u0447\u0442\u043e\u0431\u044b \u0434\u043e\u0431\u0430\u0432\u0438\u0442\u044c \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0435 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u0432 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u043e\u043d\u043d\u044b\u0435 \u0444\u0430\u0439\u043b\u044b \u043f\u043e \u0432\u0440\u0435\u043c\u044f \u0438\u0441\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f. \u0414\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438, \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u0418\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0435 \u041f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u041e\u043a\u0440\u0443\u0436\u0435\u043d\u0438\u044f \u0438 \u0414\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u041f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445.
appConfig: {\n\n applicationName: \"Custom Data Quality Application Name\"\n\n dateTimeOptions: {\n timeZone: \"GMT+3\"\n referenceDateFormat: \"yyyy-MM-dd\"\n executionDateFormat: \"yyyy-MM-dd-HH-mm-ss\"\n }\n\n enablers: {\n allowSqlQueries: false\n allowNotifications: true\n aggregatedKafkaOutput: true\n }\n\n defaultSparkOptions: [\n \"spark.sql.orc.enabled=true\"\n \"spark.sql.parquet.compression.codec=snappy\"\n \"spark.sql.autoBroadcastJoinThreshold=-1\"\n ]\n\n storage: {\n dbType: \"postgres\"\n url: \"localhost:5432/public\"\n username: \"postgres\"\n password: \"postgres\"\n schema: \"dqdb\"\n }\n\n email: {\n host: \"smtp.some-company.domain\"\n port: \"25\"\n username: \"emailUser\"\n password: \"emailPassword\"\n address: \"some.service@some-company.domain\"\n name: \"Data Quality Service\"\n sslOnConnect: true\n }\n\n mattermost: {\n host: \"https://some-team.mattermost.com\"\n token: ${dqMattermostToken}\n }\n}\n
"},{"location":"ru/01-application-setup/02-ApplicationSubmit/","title":"\u0417\u0430\u043f\u0443\u0441\u043a \u041f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0439 Data Quality","text":"\u041f\u043e\u0441\u043a\u043e\u043b\u044c\u043a\u0443 Checkita \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a \u043f\u043e\u0441\u0442\u0440\u043e\u0435\u043d \u043d\u0430 \u043e\u0441\u043d\u043e\u0432\u0435 Spark, \u0442\u043e \u043e\u043d \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u0435\u0442\u0441\u044f \u043a\u0430\u043a \u043e\u0431\u044b\u0447\u043d\u043e\u0435 Spark-\u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044f \u043a\u043e\u043c\u0430\u043d\u0434\u0443 spark-submit
. \u041a\u0430\u043a \u0438 \u043b\u044e\u0431\u043e\u0435 \u0434\u0440\u0443\u0433\u043e\u0435 Spark-\u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435, \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 Checkita \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u0437\u0430\u043f\u0443\u0449\u0435\u043d\u043e \u043a\u0430\u043a \u043b\u043e\u043a\u0430\u043b\u044c\u043d\u043e, \u0442\u0430\u043a \u0438 \u0432 \u043a\u043b\u0430\u0441\u0442\u0435\u0440\u0435 (\u0432 \u0440\u0435\u0436\u0438\u043c\u0435 client
\u0438\u043b\u0438 cluster
).
\u041e\u0434\u043d\u0430\u043a\u043e, \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f Checkita \u0442\u0440\u0435\u0431\u0443\u044e\u0442 \u043f\u0435\u0440\u0435\u0434\u0430\u0447\u0438 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u043d\u044b\u0439 \u0430\u0440\u0433\u0443\u043c\u0435\u043d\u0442\u043e\u0432 \u043f\u0440\u0438 \u0441\u0442\u0430\u0440\u0442\u0435, \u0430 \u0438\u043c\u0435\u043d\u043d\u043e:
-a
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u041f\u0443\u0442\u044c \u0434\u043e HOCON \u0444\u0430\u0439\u043b\u0430 \u0441 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f: applicaiton.conf
. \u0421\u0442\u043e\u0438\u0442 \u043e\u0442\u043c\u0435\u0442\u0438\u0442\u044c, \u0447\u0442\u043e \u0438\u043c\u044f \u0444\u0430\u0439\u043b\u0430 \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u0434\u0440\u0443\u0433\u0438\u043c, \u043e\u0434\u043d\u0430\u043a\u043e \u043e\u0431\u044b\u0447\u043d\u043e \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0444\u0430\u0439\u043b \u0441 \u0442\u0430\u043a\u0438\u043c \u0438\u043c\u0435\u043d\u0435\u043c.-j
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0421\u043f\u0438\u0441\u043e\u043a \u043f\u0443\u0442\u0435\u0439 \u0434\u043e \u0444\u0430\u0439\u043b\u043e\u0432 \u0441 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u044f\u043c\u0438 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430. \u041f\u0443\u0442\u0438 \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u0440\u0430\u0437\u0434\u0435\u043b\u0435\u043d\u044b \u0437\u0430\u043f\u044f\u0442\u044b\u043c\u0438. HOCON \u0444\u043e\u0440\u043c\u0430\u0442 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442 \u0441\u043b\u0438\u044f\u043d\u0438\u0435 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u044f, \u043f\u043e\u044d\u0442\u043e\u043c\u0443 \u043c\u043e\u0436\u043d\u043e \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u0442\u044c \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0435 \u0447\u0430\u0441\u0442\u0438 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430 \u0432 \u043e\u0442\u0434\u0435\u043b\u044c\u043d\u044b\u0445 \u0444\u0430\u0439\u043b\u0430\u0445 \u0438 \u043f\u0435\u0440\u0435\u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c \u0438\u0445.-d
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0414\u0430\u0442\u0430 \u0437\u0430 \u043a\u043e\u0442\u043e\u0440\u0443\u044e \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u0435\u0442\u0441\u044f Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d. \u0424\u043e\u0440\u043c\u0430\u0442, \u0432 \u043a\u043e\u0442\u043e\u0440\u043e\u043c \u0443\u043a\u0430\u0437\u0430\u043d\u0430 \u0434\u0430\u0442\u0430, \u0434\u043e\u043b\u0436\u0435\u043d \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u043e\u0432\u0430\u0442\u044c \u0442\u043e\u043c\u0443, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0443\u043a\u0430\u0437\u0430\u043d \u0432 \u043f\u043e\u043b\u0435 referenceDateFormat
\u0432 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u0445 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f. \u0415\u0441\u043b\u0438 \u0434\u0430\u0442\u0430 \u043d\u0435 \u0443\u043a\u0430\u0437\u0430\u043d\u0430, \u0442\u043e \u0435\u0439 \u0431\u0443\u0434\u0435\u0442 \u043f\u0440\u0438\u0441\u0432\u043e\u0435\u043d\u043e \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435 \u0444\u0430\u043a\u0442\u0438\u0447\u0435\u0441\u043a\u043e\u0439 \u0434\u0430\u0442\u044b \u0441\u0442\u0430\u0440\u0442\u0430 \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430.-l
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0410\u0440\u0433\u0443\u043c\u0435\u043d\u0442, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u0435\u0442 \u043d\u0430 \u0442\u043e, \u0447\u0442\u043e \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u0434\u043e\u043b\u0436\u043d\u043e \u0431\u044b\u0442\u044c \u0437\u0430\u043f\u0443\u0449\u0435\u043d\u043e \u0432 \u043b\u043e\u043a\u0430\u043b\u044c\u043d\u043e\u043c \u0440\u0435\u0436\u0438\u043c\u0435.-s
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0410\u0440\u0433\u0443\u043c\u0435\u043d\u0442, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u0435\u0442 \u043d\u0430 \u0442\u043e, \u0447\u0442\u043e \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u0431\u0443\u0434\u0435\u0442 \u0437\u0430\u043f\u0443\u0449\u0435\u043d\u043e \u0441 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0435\u043c Shared Spark Context. \u0412 \u044d\u0442\u043e\u043c \u0441\u043b\u0443\u0447\u0430\u0435 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u0431\u0443\u0434\u0435\u0442 \u043f\u043e\u043b\u0443\u0447\u0430\u0442\u044c \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0443\u044e\u0449\u0438\u0439 Spark Context, \u0432\u043c\u0435\u0441\u0442\u043e \u0442\u043e\u0433\u043e, \u0447\u0442\u043e\u0431\u044b \u0441\u043e\u0437\u0434\u0430\u0432\u0430\u0442\u044c \u043d\u043e\u0432\u044b\u0439. \u0422\u0430\u043a\u0436\u0435, \u0432\u0430\u0436\u043d\u043e, \u0447\u0442\u043e\u0431\u044b \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u0432 \u044d\u0442\u043e\u043c \u0441\u043b\u0443\u0447\u0430\u0435 \u043d\u0435 \u043e\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u043b\u043e Spark Context \u043f\u043e \u0437\u0430\u0432\u0435\u0440\u0448\u0435\u043d\u0438\u0438.-m
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0410\u0440\u0433\u0443\u043c\u0435\u043d\u0442, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u0435\u0442 \u043d\u0430 \u0442\u043e, \u0447\u0442\u043e \u043c\u0438\u0433\u0440\u0430\u0446\u0438\u044f \u0431\u0430\u0437\u044b \u0434\u0430\u043d\u043d\u044b\u0445 \u0434\u043e\u043b\u0436\u043d\u0430 \u0431\u044b\u0442\u044c \u0432\u044b\u043f\u043e\u043b\u043d\u0435\u043d\u0430 \u043f\u0435\u0440\u0435\u0434 \u0441\u043e\u0445\u0440\u0430\u043d\u0435\u043d\u0438\u0435\u043c \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 (\u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u0443\u0431\u0435\u0434\u0438\u0442\u044c\u0441\u044f \u0432 \u0442\u043e\u043c, \u0447\u0442\u043e \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 \u0432 \u0430\u043a\u0442\u0443\u0430\u043b\u044c\u043d\u043e\u043c \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u0438 \u0438\u043b\u0438 \u0432\u044b\u043f\u043e\u043b\u043d\u0438\u0442\u044c \u0441\u043a\u0440\u0438\u043f\u0442\u044b \u0434\u043b\u044f \u043f\u0440\u0438\u0432\u0435\u0434\u0435\u043d\u0438\u044f \u0435\u0433\u043e \u043a \u0430\u043a\u0442\u0443\u0430\u043b\u044c\u043d\u043e\u043c\u0443 \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u044e).-e
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0424\u043b\u0430\u0433, \u0441 \u043a\u043e\u0442\u043e\u0440\u044b\u043c \u043c\u043e\u0436\u043d\u043e \u043f\u0435\u0440\u0435\u0434\u0430\u0442\u044c \u043d\u0430\u0431\u043e\u0440 \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u043f\u0440\u0438 \u0441\u0442\u0430\u0440\u0442\u0435 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f. \u0414\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0435 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u0431\u0443\u0434\u0443\u0442 \u0434\u043e\u0431\u0430\u0432\u043b\u0435\u043d\u044b \u0432 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u043e\u043d\u043d\u044b\u0435 \u0444\u0430\u0439\u043b\u044b \u0438 \u0431\u0443\u0434\u0443\u0442 \u0434\u043e\u0441\u0442\u0443\u043f\u043d\u044b \u0434\u043b\u044f \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u044f. \u041f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u043f\u0435\u0440\u0435\u0434\u0430\u044e\u0442\u0441\u044f \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435 \u043a\u043b\u044e\u0447-\u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435: \"k1=v1,k2=v2,k3=v3,...\"\"
.-v
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0410\u0440\u0433\u0443\u043c\u0435\u043d\u0442, \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e \u043a\u043e\u0442\u043e\u0440\u043e\u0433\u043e \u043c\u043e\u0436\u043d\u043e \u043d\u0430\u0437\u043d\u0430\u0447\u0438\u0442\u044c \u0443\u0440\u043e\u0432\u0435\u043d\u044c \u043b\u043e\u0433\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f \u0432 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0438. \u041f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e - INFO
.\u041f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u0434\u0432\u0430 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u0434\u043b\u044f \u0437\u0430\u043f\u0443\u0441\u043a\u0430:
ru.raiffeisen.checkita.apps.batch.DataQualityBatchApp
ru.raiffeisen.checkita.apps.stream.DataQualityStreamApp
\u041d\u0438\u0436\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d \u043f\u0440\u0438\u043c\u0435\u0440 \u0437\u0430\u043f\u0443\u0441\u043a\u0430 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f Checkita \u0432 YARN \u0432 cluster
\u0440\u0435\u0436\u0438\u043c\u0435. \u041f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0443 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0443\u043a\u0430\u0437\u0430\u043d\u044b \u0432 \u0444\u0430\u0439\u043b\u0435 application.conf
, \u043f\u0440\u0438 \u044d\u0442\u043e\u043c \u0440\u0435\u043a\u0432\u0438\u0437\u0438\u0442\u044b \u0434\u043b\u044f \u0432\u0445\u043e\u0434\u0430 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u043f\u0435\u0440\u0435\u0434\u0430\u043d\u044b \u043a\u0430\u043a \u043f\u043e\u0441\u0440\u0435\u0434\u0441\u0442\u0432\u043e\u043c \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u043e\u043a\u0440\u0443\u0436\u0435\u043d\u0438\u044f, \u0442\u0430\u043a \u0438 \u0432 \u0432\u0438\u0434\u0435 \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u043f\u0440\u0438 \u0441\u0442\u0430\u0440\u0442\u0435. \u0414\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438, \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u0418\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0435 \u041f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u041e\u043a\u0440\u0443\u0436\u0435\u043d\u0438\u044f \u0438 \u0414\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u041f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445.
export DQ_APPLICATION=\"<\u043b\u043e\u043a\u0430\u043b\u044c\u043d\u044b\u0439 \u0438\u043b\u0438 \u0443\u0434\u0430\u043b\u0435\u043d\u043d\u044b\u0439 (HDFS, S3) \u043f\u0443\u0442\u044c \u0434\u043e jar \u0441 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435\u043c>\"\nexport DQ_DEPENDENCIES=\"<\u043b\u043e\u043a\u0430\u043b\u044c\u043d\u044b\u0439 \u0438\u043b\u0438 \u0443\u0434\u0430\u043b\u0435\u043d\u043d\u044b\u0439 (HDFS, S3) \u043f\u0443\u0442\u044c \u0434\u043e uber-jar \u0441 \u0437\u0430\u0432\u0438\u0441\u0438\u043c\u043e\u0441\u0442\u044f\u043c\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f>\"\nexport DQ_APP_CONFIG=\"<\u043b\u043e\u043a\u0430\u043b\u044c\u043d\u044b\u0439 \u0438\u043b\u0438 \u0443\u0434\u0430\u043b\u0435\u043d\u043d\u044b\u0439 (HDFS, S3) \u043f\u0443\u0442\u044c \u0434\u043e \u0444\u0430\u0439\u043b\u0430 \u0441 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f>\"\nexport DQ_JOB_CONFIGS=\"<\u043b\u043e\u043a\u0430\u043b\u044c\u043d\u044b\u0435 \u0438\u043b\u0438 \u0443\u0434\u0430\u043b\u0435\u043d\u043d\u044b\u0435 (HDFS, S3) \u043f\u0443\u0442\u0438 \u0434\u043e \u0444\u0430\u0439\u043b\u043e\u0432 \u0441 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0435\u0439 \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430 (\u0440\u0430\u0437\u0434\u0435\u043b\u0435\u043d\u044b \u0437\u0430\u043f\u044f\u0442\u044b\u043c\u0438)>\"\n\n# \u041f\u043e\u0441\u043a\u043e\u043b\u044c\u043a\u0443 \u0443\u043a\u0430\u0437\u0430\u043d\u043d\u044b\u0435 \u0444\u0430\u0439\u043b\u044b \u0441\u043d\u0430\u0447\u0430\u043b\u0430 \u0431\u0443\u0434\u0443\u0442 \u0437\u0430\u0433\u0440\u0443\u0436\u0435\u043d\u044b \u043d\u0430 \u0434\u0440\u0430\u0439\u0432\u0435\u0440 \u0438 \u044d\u043a\u0437\u0435\u043a\u044c\u044e\u0442\u043e\u0440\u044b,\n# \u0442\u043e \u043e\u043d\u0438 \u0431\u0443\u0434\u0443\u0442 \u043d\u0430\u0445\u043e\u0434\u0438\u0442\u044c\u0441\u044f \u0432 \u0440\u0430\u0431\u043e\u0447\u0435\u0439 \u0434\u0438\u0440\u0435\u043a\u0442\u043e\u0440\u0438\u0438. \n# \u0422\u0430\u043a\u0438\u043c \u043e\u0431\u0440\u0430\u0437\u043e\u043c, \u0432 \u0430\u0440\u0433\u0443\u043c\u0435\u043d\u0442\u0430\u0445 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u043d\u0443\u0436\u043d\u043e \u0443\u043a\u0430\u0437\u0430\u0442\u044c \u0442\u043e\u043b\u044c\u043a\u043e \u043b\u0438\u0448\u044c \u0438\u043c\u0435\u043d\u0430 \u0444\u0430\u0439\u043b\u043e\u0432:\nexport DQ_APP_CONFIG_FILE=$(basename $DQ_APP_CONFIG)\nexport DQ_JOB_CONFIG_FILES=\"<job configuration files separated by commas (only file names)>\"\nexport REFERENCE_DATE=\"2023-08-01\"\n\n# \u0412\u0445\u043e\u0434\u043d\u0430\u044f \u0442\u043e\u0447\u043a\u0430 \u0434\u043b\u044f \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f (executable class): ru.raiffeisen.checkita.apps.batch.DataQualityBatchApp\n# \u041f\u0430\u0440\u0430\u043c\u0435\u0442\u0440 --name \u0432 spark-submit \u043a\u043e\u043c\u0430\u043d\u0434\u0435 \u0438\u043c\u0435\u0435\u0442 \u0431\u043e\u043b\u0435\u0435 \u0432\u044b\u0441\u043e\u043a\u0438\u0439 \u043f\u0440\u0438\u043e\u0440\u0438\u0442\u0435\u0442, \u0447\u0435\u043c\n# \u0438\u043c\u044f \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u0443\u043a\u0430\u0437\u0430\u043d\u043d\u043e\u0435 \u0432 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u0445 `application.conf`.\n\nspark-submit\\\n --class ru.raiffeisen.checkita.apps.batch.DataQualityBatchApp \\\n --name \"Checkita Data Quality\" \\\n --master yarn \\\n --deploy-mode cluster \\\n --num-executors 1 \\\n --executor-memory 2g \\\n --executor-cores 4 \\\n --driver-memory 2g \\\n --jars $DQ_DEPENDENCIES \\\n --files \"$DQ_APP_CONFIG,$DQ_DQ_JOB_CONFIGS\" \\\n --conf \"spark.executor.memoryOverhead=2g\" \\\n --conf \"spark.driver.memoryOverhead=2g\" \\\n --conf \"spark.driver.maxResultSize=4g\" \\\n $DQ_APPLICATION \\\n -a $DQ_APP_CONFIG_FILE \\\n -j $DQ_JOB_CONFIG_FILES \\\n -d $REFERENCE_DATE \\\n -e \"storage_db_user=some_db_user,storage_db_password=some_db_password\"\n
"},{"location":"ru/01-application-setup/03-ResultsStorage/","title":"\u0425\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0420\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432","text":"\u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c \u0432\u0441\u0435 \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u0438 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0441\u043e\u0437\u0434\u0430\u0442\u044c \u0438 \u043d\u0430\u0441\u0442\u0440\u043e\u0438\u0442\u044c \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432. Checkita \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a \u043c\u043e\u0436\u0435\u0442 \u0440\u0430\u0431\u043e\u0442\u0430\u0442\u044c \u0441 \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u043c\u0438 RDBMS \u0434\u043b\u044f \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432. \u041f\u043e\u043c\u0438\u043c\u043e \u044d\u0442\u043e\u0433\u043e, Hive \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d \u043a\u0430\u043a \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432, \u0442\u0430\u043a \u0436\u0435 \u043a\u0430\u043a \u0438 \u043e\u0431\u044b\u0447\u043d\u043e\u0435 \u0444\u0430\u0439\u043b\u043e\u0432\u043e\u0435 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435.
\u041f\u043e\u043b\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0445 \u0442\u0438\u043f\u043e\u0432 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0434\u0430\u043d \u043d\u0438\u0436\u0435:
PostgreSQL
(v.9.3 \u0438 \u0432\u044b\u0448\u0435) - \u0440\u0435\u043a\u043e\u043c\u0435\u043d\u0434\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u044f \u0432 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0435 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432.Oracle
MySQL
Microsoft SQL Server
SQLite
H2
Hive
File
(\u0434\u0438\u0440\u0435\u043a\u0442\u043e\u0440\u0438\u044f \u0432 \u043b\u043e\u043a\u0430\u043b\u044c\u043d\u043e\u0439 \u0444\u0430\u0439\u043b\u043e\u0432\u043e\u0439 \u0441\u0438\u0441\u0442\u0435\u043c\u0435 \u0438\u043b\u0438 \u0443\u0434\u0430\u043b\u0435\u043d\u043d\u043e\u0439 (HDFS, S3))Checkita \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442 \u044d\u0432\u043e\u043b\u044e\u0446\u0438\u044e \u0441\u0445\u0435\u043c\u044b \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432. \u0414\u043b\u044f \u0432\u044b\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f \u043c\u0438\u0433\u0440\u0430\u0446\u0438\u0439 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f Flyway. \u0422\u0430\u043a\u0438\u043c \u043e\u0431\u0440\u0430\u0437\u043e\u043c, \u0435\u0441\u043b\u0438 \u0432 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0435 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0432\u044b\u0431\u0440\u0430\u043d\u0430 \u043e\u0434\u043d\u0430 \u0438\u0437 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u043c\u044b\u0445 RDBMS, \u0442\u043e \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e \u043f\u0440\u043e\u0432\u0435\u0441\u0442\u0438 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0443 \u0435\u0433\u043e \u0441\u0445\u0435\u043c\u044b \u043f\u0440\u0438 \u043f\u0435\u0440\u0432\u043e\u043c \u0437\u0430\u043f\u0443\u0441\u043a\u0435 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430, \u0443\u043a\u0430\u0437\u0430\u0432 \u0430\u0440\u0433\u0443\u043c\u0435\u043d\u0442 -m
\u043f\u0440\u0438 \u0441\u0442\u0430\u0440\u0442\u0435. \u041f\u043e\u0434\u0440\u043e\u0431\u043d\u0435\u0435 \u043e \u0442\u043e\u043c, \u043a\u0430\u043a \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u0442\u044c \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f Checkita, \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u0417\u0430\u043f\u0443\u0441\u043a \u041f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0439 Data Quality.
\u0412\u0410\u0416\u041d\u041e: \u041c\u0438\u0433\u0440\u0430\u0446\u0438\u0438 Flyway \u043e\u0431\u044b\u0447\u043d\u043e \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u044e\u0442\u0441\u044f \u043b\u0438\u0431\u043e \u0432 \u043f\u0443\u0441\u0442\u043e\u0439 \u0431\u0430\u0437\u0435/\u0441\u0445\u0435\u043c\u0435, \u043b\u0438\u0431\u043e \u0432 \u0442\u043e\u0439, \u043a\u043e\u0442\u043e\u0440\u0430\u044f \u0443\u0436\u0435 \u0431\u044b\u043b\u0430 \u043f\u0440\u043e\u0438\u043d\u0438\u0446\u0438\u0430\u043b\u0438\u0437\u0438\u0440\u043e\u0432\u0430\u043d\u0430 \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e Flyway. \u0412 Checkita \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0435 \u0442\u0430\u043a\u0436\u0435 \u043c\u043e\u0436\u043d\u043e \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u0442\u044c \u043c\u0438\u0433\u0440\u0430\u0446\u0438\u0438 \u0432 \u043d\u0435\u043f\u0443\u0441\u0442\u043e\u0439 \u0431\u0430\u0437\u0435/\u0441\u0445\u0435\u043c\u0435. \u0412 \u044d\u0442\u043e\u043c \u0441\u043b\u0443\u0447\u0430\u0435, \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044e \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0443\u0431\u0435\u0434\u0438\u0442\u044c\u0441\u044f, \u0447\u0442\u043e \u0432 \u0431\u0430\u0437\u0435/\u0441\u0445\u0435\u043c\u0435 \u043d\u0435\u0442 \u043a\u043e\u043d\u0444\u043b\u0438\u043a\u0442\u0443\u044e\u0449\u0438\u0445 \u0438\u043c\u0435\u043d \u0442\u0430\u0431\u043b\u0438\u0446.
\u0415\u0441\u043b\u0438 \u0432\u044b\u0431\u0440\u0430\u043d File
\u0442\u0438\u043f \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432, \u0442\u043e \u0434\u043e\u0441\u0442\u0430\u0442\u043e\u0447\u043d\u043e \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u0438\u0442\u044c \u043f\u0443\u0442\u044c \u0434\u043e \u0434\u0438\u0440\u0435\u043a\u0442\u043e\u0440\u0438\u0438/\u0431\u0430\u043a\u0435\u0442\u0430, \u0433\u0434\u0435 \u0431\u0443\u0434\u0443\u0442 \u0445\u0440\u0430\u043d\u0438\u0442\u044c\u0441\u044f \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b. \u0420\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b \u0441\u043e\u0445\u0440\u0430\u043d\u044f\u044e\u0442\u0441\u044f \u043a\u0430\u043a .parquet
\u0444\u0430\u0439\u043b\u044b \u0441 \u0442\u0430\u043a\u043e\u0439 \u0436\u0435 \u0441\u0445\u0435\u043c\u043e\u0439, \u043a\u0430\u043a \u0438 \u0432 \u0441\u043b\u0443\u0447\u0430\u0435 \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f \u0438\u0445 \u0432 RDBMS. \u0414\u043b\u044f \u0444\u0430\u0439\u043b\u043e\u0432\u043e\u0433\u043e \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u043d\u0435 \u043f\u0440\u0435\u0434\u0443\u0441\u043c\u043e\u0442\u0440\u0435\u043d\u044b \u043c\u0435\u0445\u0430\u043d\u0438\u0437\u043c\u044b \u044d\u0432\u043e\u043b\u044e\u0446\u0438\u0438 \u0441\u0445\u0435\u043c\u044b. \u041f\u043e\u044d\u0442\u043e\u043c\u0443, \u0435\u0441\u043b\u0438 \u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u0430 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0438\u0437\u043c\u0435\u043d\u0438\u0442\u0441\u044f \u0432 \u0431\u0443\u0434\u0443\u0449\u0435\u043c, \u0442\u043e \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044e \u0431\u0443\u0434\u0435\u0442 \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0441\u0430\u043c\u043e\u0441\u0442\u043e\u044f\u0442\u0435\u043b\u044c\u043d\u043e \u043e\u0431\u043d\u043e\u0432\u0438\u0442\u044c \u0441\u0445\u0435\u043c\u044b \u0432 \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0443\u044e\u0449\u0438\u0445 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u0430\u0445.
\u0412\u0410\u0416\u041d\u041e: \u041f\u0440\u0438 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0438 \u0444\u0430\u0439\u043b\u043e\u0432\u043e\u0433\u043e \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430, \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b \u043d\u0435 \u043f\u0430\u0440\u0442\u0438\u0446\u0438\u043e\u043d\u0438\u0440\u0443\u044e\u0442\u0441\u044f \u043d\u0438 \u043f\u043e \u043e\u0434\u043d\u043e\u043c\u0443 \u0438\u0437 \u043f\u043e\u043b\u0435\u0439. \u0422\u0430\u043a\u0438\u043c \u043e\u0431\u0440\u0430\u0437\u043e\u043c, \u043a\u0430\u0436\u0434\u044b\u0439 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d \u043f\u0440\u0438 \u0441\u043e\u0445\u0440\u0430\u043d\u0435\u043d\u0438\u0438 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0431\u0443\u0434\u0435\u0442 \u0447\u0438\u0442\u0430\u0442\u044c \u0444\u0430\u0439\u043b\u044b \u0446\u0435\u043b\u0438\u043a\u043e\u043c \u0438 \u0438\u0445 \u043f\u0435\u0440\u0435\u0437\u0430\u043f\u0438\u0441\u044b\u0432\u0430\u0442\u044c. \u0412\u0432\u0438\u0434\u0443 \u044d\u0442\u0438\u0445 \u043e\u0441\u043e\u0431\u0435\u043d\u043d\u043e\u0441\u0442\u0435\u0439, \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0435 \u044d\u0442\u043e\u0433\u043e \u0442\u0438\u043f\u0430 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0432 \u043f\u0440\u043e\u0434\u0443\u043a\u0442\u043e\u0432\u043e\u0439 \u0441\u0440\u0435\u0434\u0435 \u043d\u0435 \u0440\u0435\u043a\u043e\u043c\u0435\u043d\u0434\u0443\u0435\u0442\u0441\u044f.
\u0414\u043b\u044f Hive
\u0442\u0438\u043f\u0430 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432, \u043c\u0435\u0445\u0430\u043d\u0438\u0437\u043c\u044b \u044d\u0432\u043e\u043b\u044e\u0446\u0438\u0438 \u0441\u0445\u0435\u043c\u044b \u0442\u0430\u043a\u0436\u0435 \u043d\u0435\u0434\u043e\u0441\u0442\u0443\u043f\u043d\u044b. \u041f\u043e\u044d\u0442\u043e\u043c\u0443 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044e \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0441\u0430\u043c\u043e\u0441\u0442\u043e\u044f\u0442\u0435\u043b\u044c\u043d\u043e \u0441\u043e\u0437\u0434\u0430\u0442\u044c \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u044b\u0435 Hive-\u0442\u0430\u0431\u043b\u0438\u0446\u044b. DDL \u0441\u043a\u0440\u0438\u043f\u0442\u044b \u0438\u0437 \u0433\u043b\u0430\u0432\u044b \u0421\u043a\u0440\u0438\u043f\u0442\u044b \u0434\u043b\u044f \u041d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0438 \u0425\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0420\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0432 Hive.
\u0412\u0410\u0416\u041d\u041e: \u0420\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u043f\u043e\u0437\u0438\u0446\u0438\u043e\u043d\u0438\u0440\u043e\u0432\u0430\u043d\u044b \u043f\u043e job_id
. \u0418\u0434\u0435\u043d\u0442\u0438\u0444\u0438\u043a\u0430\u0442\u043e\u0440 \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430 \u0432\u044b\u0431\u0440\u0430\u043d \u043a\u0430\u043a \u043a\u043e\u043b\u043e\u043d\u043a\u0430 \u043f\u043e\u0437\u0438\u0446\u0438\u043e\u043d\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f \u0434\u043b\u044f \u0442\u043e\u0433\u043e, \u0447\u0442\u043e\u0431\u044b \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0438\u0442\u044c \u0431\u043e\u043b\u0435\u0435 \u0431\u044b\u0441\u0442\u0440\u043e\u0435 \u043f\u043e\u043b\u0443\u0447\u0435\u043d\u0438\u0435 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0440\u0430\u0441\u0447\u0435\u0442\u043e\u0432 \u0442\u0440\u0435\u043d\u0434\u043e\u0432\u044b\u0445 \u043f\u0440\u043e\u0432\u0435\u0440\u043e\u043a (\u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0442\u0441\u044f \u0434\u043b\u044f \u043e\u0431\u043d\u0430\u0440\u0443\u0436\u0435\u043d\u0438\u044f \u0430\u043d\u043e\u043c\u0430\u043b\u0438\u0439 \u0432 \u0434\u0430\u043d\u043d\u044b\u0445). Hive
\u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0440\u0430\u0431\u043e\u0442\u0430\u0435\u0442 \u0431\u044b\u0441\u0442\u0440\u0435\u0435, \u0447\u0435\u043c File
\u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435, \u0442.\u043a. \u0442\u043e\u043b\u044c\u043a\u043e \u043f\u0430\u0440\u0442\u0438\u0446\u0438\u044f, \u043a\u043e\u0442\u043e\u0440\u0430\u044f \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0443\u0435\u0442 \u0438\u0434\u0435\u043d\u0442\u0438\u0444\u0438\u043a\u0430\u0442\u043e\u0440\u0443 \u0442\u0435\u043a\u0443\u0449\u0435\u0433\u043e \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430 \u0447\u0438\u0442\u0430\u0435\u0442\u0441\u044f \u0438 \u043f\u0435\u0440\u0435\u0437\u0430\u043f\u0438\u0441\u044b\u0432\u0430\u0435\u0442\u0441\u044f. \u0422\u0435\u043c \u043d\u0435 \u043c\u0435\u043d\u0435\u0435, \u044d\u0442\u043e\u0442 \u0442\u0438\u043f \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0442\u0430\u043a\u0436\u0435 \u043d\u0435 \u0440\u0435\u043a\u043e\u043c\u0435\u043d\u0434\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u044f \u0432 \u043f\u0440\u043e\u0434\u0443\u043a\u0442\u043e\u0432\u044b\u0445 \u0441\u0440\u0435\u0434\u0430\u0445, \u0433\u0434\u0435 \u0431\u0443\u0434\u0435\u0442 \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u0442\u044c\u0441\u044f \u0431\u043e\u043b\u044c\u0448\u043e\u0435 \u043a\u043e\u043b\u0438\u0447\u0435\u0441\u0442\u0432\u043e \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u043e\u0432.
Checkita \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a \u0437\u0430\u043f\u0438\u0441\u044b\u0432\u0430\u0435\u0442 \u0447\u0435\u0442\u044b\u0440\u0435 \u0442\u0438\u043f\u0430 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432:
\u0421\u0445\u0435\u043c\u044b \u0434\u043b\u044f \u0432\u0441\u0435\u0445 \u0442\u0438\u043f\u043e\u0432 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u044b \u043d\u0438\u0436\u0435:
"},{"location":"ru/01-application-setup/03-ResultsStorage/#regular-metrics-results-schema","title":"Regular Metrics Results Schema","text":"(job_id, metric_id, reference_date)
source_id
& column_names
\u0441\u043e\u0434\u0435\u0440\u0436\u0430\u0442 \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u0435 \u0441\u043f\u0438\u0441\u043a\u043e\u0432 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: '[val1,val2,val3]'
.params
\u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u044f\u0435\u0442 \u0441\u043e\u0431\u043e\u0439 JSON \u0441\u0442\u0440\u043e\u043a\u0443.(job_id, metric_id, reference_date)
source_id
\u0441\u043e\u0434\u0435\u0440\u0436\u0438\u0442 \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u0435 \u0441\u043f\u0438\u0441\u043a\u0430 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: '[val1,val2,val3]'
.(job_id, check_id, reference_date)
source_id
\u0441\u043e\u0434\u0435\u0440\u0436\u0438\u0442 \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u0435 \u0441\u043f\u0438\u0441\u043a\u0430 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: '[val1,val2,val3]'
.(job_id, check_id, reference_date)
source_id
\u0441\u043e\u0434\u0435\u0440\u0436\u0438\u0442 \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u0435 \u0441\u043f\u0438\u0441\u043a\u0430 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: '[val1,val2,val3]'
.(job_id, reference_date)
;version_info
- \u0441\u0442\u0440\u043e\u043a\u0430 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435 JSON;config
- \u0441\u0442\u0440\u043e\u043a\u0430 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435 JSON.\u041d\u0438\u0436\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u044b HiveQL \u0441\u043a\u0440\u0438\u043f\u0442\u044b, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u044b \u0434\u043b\u044f \u0438\u043d\u0438\u0446\u0438\u0430\u043b\u0438\u0437\u0430\u0446\u0438\u0438 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0432 Hive:
-- \u041d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0437\u0430\u043c\u0435\u043d\u0438\u0442\u044c <schema_name> \u0438 <schema_dir> \u043d\u0430 \u0444\u0430\u043a\u0442\u0438\u0447\u0435\u0441\u043a\u0438\u0435 \u0438\u043c\u044f \u0441\u0445\u0435\u043c\u044b \u0438 \u043f\u0443\u0442\u044c \u0434\u043e \u043d\u0435\u0435.\nset hivevar:schema_name=<schema_name>;\nset hivevar:schema_dir=<schema_path>;\n\nCREATE SCHEMA IF NOT EXISTS ${schema_name};\n\nDROP TABLE IF EXISTS ${schema_name}.results_metric_regular;\nCREATE EXTERNAL TABLE ${schema_name}.results_metric_regular\n(\n job_id STRING COMMENT '',\n metric_id STRING COMMENT '',\n metric_name STRING COMMENT '',\n description STRING COMMENT '',\n metadata STRING COMMENT '',\n source_id STRING COMMENT '',\n column_names STRING COMMENT '',\n params STRING COMMENT '',\n result DOUBLE COMMENT '',\n additional_result STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT ''\n)\nCOMMENT 'Data Quality Regular Metrics Results'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/results_metric_regular';\n\nDROP TABLE IF EXISTS ${schema_name}.results_metric_composed;\nCREATE EXTERNAL TABLE ${schema_name}.results_metric_composed\n(\n job_id STRING COMMENT '',\n metric_id STRING COMMENT '',\n metric_name STRING COMMENT '',\n description STRING COMMENT '',\n metadata STRING COMMENT '',\n source_id STRING COMMENT '',\n formula STRING COMMENT '',\n result DOUBLE COMMENT '',\n additional_result STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT ''\n)\nCOMMENT 'Data Quality Composed Metrics Results'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/results_metric_composed';\n\nDROP TABLE IF EXISTS ${schema_name}.results_check_load;\nCREATE EXTERNAL TABLE ${schema_name}.results_check_load\n(\n job_id STRING COMMENT '',\n check_id STRING COMMENT '',\n check_name STRING COMMENT '',\n description STRING COMMENT '',\n metadata STRING COMMENT '',\n source_id STRING COMMENT '',\n expected STRING COMMENT '',\n status STRING COMMENT '',\n message STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT ''\n)\nCOMMENT 'Data Quality Load Checks Results'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/results_check_load';\n\nDROP TABLE IF EXISTS ${schema_name}.results_check;\nCREATE EXTERNAL TABLE ${schema_name}.results_check\n(\n job_id STRING COMMENT '',\n check_id STRING COMMENT '',\n check_name STRING COMMENT '',\n description STRING COMMENT '',\n metadata STRING COMMENT '',\n source_id STRING COMMENT '',\n base_metric STRING COMMENT '',\n compared_metric STRING COMMENT '',\n compared_threshold DOUBLE COMMENT '',\n lower_bound DOUBLE COMMENT '',\n upper_bound DOUBLE COMMENT '',\n status STRING COMMENT '',\n message STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT ''\n)\nCOMMENT 'Data Quality Checks Results'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/results_check';\n\nDROP TABLE IF EXISTS ${schema_name}.job_state;\nCREATE EXTERNAL TABLE ${schema_name}.job_state\n(\n job_id STRING COMMENT '',\n config STRING COMMENT '',\n version_info STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT ''\n)\nCOMMENT 'Data Quality Job State'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/job_state';\n
"},{"location":"ru/02-general-concepts/","title":"\u041e\u0441\u043d\u043e\u0432\u043d\u044b\u0435 \u043a\u043e\u043d\u0446\u0435\u043f\u0442\u044b","text":"\u0412 \u0434\u0430\u043d\u043d\u043e\u043c \u0440\u0430\u0437\u0434\u0435\u043b\u0435 \u043e\u0431\u044a\u044f\u0441\u043d\u044f\u044e\u0442\u0441\u044f \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0435 \u0430\u0441\u043f\u0435\u043a\u0442\u044b \u0440\u0430\u0431\u043e\u0442\u044b \u0441 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u043e\u043c Checkita Data Quality.
"},{"location":"ru/02-general-concepts/01-WorkingWithDateTime/","title":"\u0420\u0430\u0431\u043e\u0442\u0430 \u0441 \u0414\u0430\u0442\u0430\u043c\u0438","text":"\u0417\u0434\u0435\u0441\u044c \u0438 \u0434\u0430\u043b\u0435\u0435 \u043f\u043e\u0434 \u0434\u0430\u0442\u043e\u0439 \u043f\u043e\u043d\u0438\u043c\u0430\u0435\u0442\u0441\u044f DateTime \u043e\u0431\u044a\u0435\u043a\u0442, \u0445\u0440\u0430\u043d\u044f\u0449\u0438\u0439 \u043a\u0430\u043a \u0434\u0430\u0442\u0443, \u0442\u0430\u043a \u0438 \u0432\u0440\u0435\u043c\u044f
\u0412 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0435 Checkita \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0443\u0435\u0442 \u0434\u0432\u0430 \u043e\u0441\u043d\u043e\u0432\u043d\u044b\u0445 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u0434\u0430\u0442\u044b, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0442\u0441\u044f \u0434\u043b\u044f \u0438\u0434\u0435\u043d\u0442\u0438\u0444\u0438\u043a\u0430\u0446\u0438\u0438 \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0445 \u0437\u0430\u043f\u0443\u0441\u043a\u043e\u0432 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u043e\u0432:
referenceDate
- \u0434\u0430\u0442\u0430, \u043a\u043e\u0442\u043e\u0440\u0430\u044f \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u0435\u0442 \u043d\u0430 \u0442\u043e, \u0437\u0430 \u043a\u0430\u043a\u043e\u0439 \u043f\u0435\u0440\u0438\u043e\u0434 \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u0435\u0442\u0441\u044f \u0438 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0435\u0442\u0441\u044f \u0443\u043a\u0430\u0437\u0430\u043d\u043d\u044b\u0439 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d.executionDate
- \u0434\u0430\u0442\u0430, \u043a\u043e\u0442\u043e\u0440\u0430\u044f \u0445\u0440\u0430\u043d\u0438\u0442 \u0444\u0430\u043a\u0442\u0438\u0447\u0435\u0441\u043a\u043e\u0435 \u0432\u0440\u0435\u043c\u044f \u0437\u0430\u043f\u0443\u0441\u043a\u0430 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430.\u0422\u0438\u043f\u043e\u0432\u043e\u0439 \u043f\u0440\u0438\u043c\u0435\u0440: \u043c\u044b \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u0435\u043c \u043a\u0430\u043a\u043e\u0439-\u043b\u0438\u0431\u043e ETL \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d (\u0442\u0430\u043a\u0436\u0435 \u0441\u043e\u0434\u0435\u0440\u0436\u0438\u0442 \u0437\u0430\u0434\u0430\u0447\u0443 \u043f\u043e \u0440\u0430\u0441\u0447\u0435\u0442\u0443 \u043c\u0435\u0442\u0440\u0438\u043a \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0430 \u0434\u0430\u043d\u043d\u044b\u0445) \u043f\u043e\u0441\u043b\u0435 \u0437\u0430\u043a\u0440\u044b\u0442\u0438\u044f \u0431\u0438\u0437\u043d\u0435\u0441 \u0434\u043d\u044f, \u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440 \u0432 \u043f\u043e\u043b\u043d\u043e\u0447\u044c. \u0422\u0430\u043a\u0438\u043c \u043e\u0431\u0440\u0430\u0437\u043e\u043c, referenceDate
\u0431\u0443\u0434\u0435\u0442 \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u0442\u044c \u043d\u0430 \u043f\u0440\u0435\u0434\u044b\u0434\u0443\u0449\u0438\u0439 \u0434\u0435\u043d\u044c, \u0437\u0430 \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u043c\u044b \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0435\u043c ETL, \u0430 executionDate
\u0431\u0443\u0434\u0435\u0442 \u0445\u0440\u0430\u043d\u0438\u0442\u044c \u0442\u0435\u043a\u0443\u0449\u0443\u044e \u0434\u0430\u0442\u0443 - \u0434\u0430\u0442\u0443 \u0444\u0430\u043a\u0442\u0438\u0447\u0435\u0441\u043a\u043e\u0433\u043e \u0437\u0430\u043f\u0443\u0441\u043a\u0430 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430. \u0412\u0435\u0440\u043e\u044f\u0442\u043d\u043e, \u0443 \u043d\u0430\u0441 \u043f\u043e\u044f\u0432\u0438\u0442\u0441\u044f \u043f\u043e\u0442\u0440\u0435\u0431\u043d\u043e\u0441\u0442\u044c \u0432 \u0442\u043e\u043c, \u0447\u0442\u043e\u0431\u044b \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u044b\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u044d\u0442\u0438\u0445 \u0434\u0430\u0442 \u043e\u0442\u043b\u0438\u0447\u0430\u043b\u0438\u0441\u044c. \u0418 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a Checkita \u0434\u0430\u0435\u0442 \u043d\u0430\u043c \u0442\u0430\u043a\u0443\u044e \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c, \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u044f \u043d\u0430\u0441\u0442\u0440\u0430\u0438\u0432\u0430\u0442\u044c \u0438\u043d\u0434\u0438\u0432\u0438\u0434\u0443\u0430\u043b\u044c\u043d\u044b\u0435 \u0444\u043e\u0440\u043c\u0430\u0442\u044b \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u0434\u043b\u044f \u043a\u0430\u0436\u0434\u043e\u0439 \u0438\u0437 \u044d\u0442\u0438\u0445 \u0434\u0430\u0442 \u0432 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u0445 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f.
\u0422\u0430\u043a \u043a\u0430\u043a referenceDate
\u043c\u043e\u0436\u0435\u0442 \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u0442\u044c \u043d\u0430 \u043f\u0440\u043e\u0448\u0435\u0434\u0448\u0438\u0435 \u0434\u0430\u0442\u044b, \u0442\u043e \u0435\u0435 \u043c\u043e\u0436\u043d\u043e \u044f\u0432\u043d\u043e \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u0442\u044c \u0432 \u043f\u0440\u0438 \u0441\u0442\u0430\u0440\u0442\u0435 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f. \u0415\u0441\u043b\u0438 \u044d\u0442\u0430 \u0434\u0430\u0442\u0430 \u043d\u0435 \u0443\u043a\u0430\u0437\u0430\u043d\u0430 \u044f\u0432\u043d\u043e, \u0442\u043e \u043e\u043d\u0430 \u0431\u0443\u0434\u0435\u0442 \u0442\u0430\u043a\u0436\u0435 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u043e\u0432\u0430\u0442\u044c \u0444\u0430\u043a\u0442\u0438\u0447\u0435\u0441\u043a\u043e\u0439 \u0434\u0430\u0442\u0435 \u0441\u0442\u0430\u0440\u0442\u0430 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f. \u0421\u043c. \u0433\u043b\u0430\u0432\u0443 \u0417\u0430\u043f\u0443\u0441\u043a \u041f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0439 Data Quality \u0434\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043e\u0431 \u0430\u0440\u0433\u0443\u043c\u0435\u043d\u0442\u0430\u0445, \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u043c\u044b\u0445 \u043f\u0440\u0438 \u0437\u0430\u043f\u0443\u0441\u043a\u0435 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0439 Data Quality.
\u041e\u0431\u0435 \u044d\u0442\u0438 \u0434\u0430\u0442\u044b \u0448\u0438\u0440\u043e\u043a\u043e \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0442\u0441\u044f \u0432\u043d\u0443\u0442\u0440\u0438 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430. \u041f\u043e\u044d\u0442\u043e\u043c\u0443, \u0432 \u043b\u044e\u0431\u044b\u0445 \u0441\u0438\u0442\u0443\u0430\u0446\u0438\u044f\u0445, \u043a\u043e\u0433\u0434\u0430 \u043d\u0430\u043c \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f \u0438\u0445 \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u0435, \u043e\u043d\u043e \u043f\u043e\u043b\u0443\u0447\u0430\u0435\u0442\u0441\u044f \u0432 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0438\u0438 \u0441 \u0442\u0435\u043c \u0444\u043e\u0440\u043c\u0430\u0442\u043e\u043c, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0443\u043a\u0430\u0437\u0430\u043d \u0432 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u0445 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f.
\u0422\u0430\u043a\u0436\u0435 \u043d\u0443\u0436\u043d\u043e \u043e\u0442\u043c\u0435\u0442\u0438\u0442\u044c, \u0447\u0442\u043e \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u0435 \u0444\u043e\u0440\u043c\u0438\u0440\u0443\u0435\u0442\u0441\u044f \u0441 \u0443\u0447\u0435\u0442\u043e\u043c \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0439 \u0437\u043e\u043d\u044b, \u0432 \u043a\u043e\u0442\u043e\u0440\u043e\u0439 \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u0435\u0442\u0441\u044f \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435. \u0412\u0440\u0435\u043c\u0435\u043d\u043d\u0430 \u0437\u043e\u043d\u0430 \u0442\u0430\u043a\u0436\u0435 \u0437\u0430\u0434\u0430\u0435\u0442\u0441\u044f \u0432 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u0445 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f. \u041f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0437\u043e\u043d\u0430 UTC
.
\u0412 \u0434\u043e\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u0435, \u043d\u043e \u043d\u0435 \u043c\u0435\u043d\u0435\u0435 \u0432\u0430\u0436\u043d\u043e: \u043c\u044b \u0441\u043e\u0437\u043d\u0430\u0442\u0435\u043b\u044c\u043d\u043e \u0438\u0437\u0431\u0435\u0433\u0430\u0435\u043c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u044f \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0433\u043e \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u0434\u0430\u0442 \u043f\u0440\u0438 \u0441\u043e\u0445\u0440\u0430\u043d\u0435\u043d\u0438\u0438 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0432 \u0431\u0430\u0437\u0443 \u0434\u0430\u043d\u043d\u044b\u0445. \u0412 \u044d\u0442\u043e\u043c \u0441\u043b\u0443\u0447\u0430\u0435 \u0434\u0430\u0442\u044b \u043a\u043e\u043d\u0432\u0435\u0440\u0442\u0438\u0440\u0443\u044e\u0442\u0441\u044f \u0432 \u0442\u0438\u043f Timestamp \u0438 \u043f\u0440\u0438\u0432\u043e\u0434\u044f\u0442\u0441\u044f \u043a \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0439 \u0437\u043e\u043d\u0435 UTC
. \u0422\u0430\u043a\u043e\u0439 \u043f\u043e\u0434\u0445\u043e\u0434 \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u043d\u0430\u0434\u0435\u0436\u043d\u043e \u0441\u0442\u0440\u043e\u0438\u0442\u044c \u0437\u0430\u043f\u0440\u043e\u0441\u044b \u043a \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0443 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043d\u0435 \u0431\u0443\u0434\u0443\u0442 \u0437\u0430\u0432\u0438\u0441\u0435\u0442\u044c \u043e\u0442 \u043d\u0430\u0441\u0442\u0440\u043e\u0435\u043a \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u0434\u0430\u0442. \u0421\u043c. \u0433\u043b\u0430\u0432\u0443 \u0425\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0420\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0434\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043e \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432.
\u0412\u0410\u0416\u041d\u041e: \u0424\u0430\u043a\u0442\u0438\u0447\u0435\u0441\u043a\u0438\u0435 \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u044b\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f referenceDate
\u0438 exectionDate
\u0432\u0441\u0435\u0433\u0434\u0430 \u0434\u043e\u0431\u0430\u0432\u043b\u044f\u044e\u0442\u0441\u044f \u0432 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u043e\u043d\u043d\u044b\u0435 \u0444\u0430\u0439\u043b\u044b \u0432 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0435 \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b. \u0414\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043e\u0431 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0438 \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445, \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u0418\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0435 \u041f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u041e\u043a\u0440\u0443\u0436\u0435\u043d\u0438\u044f \u0438 \u0414\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u041f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445.
Hocon \u0444\u043e\u0440\u043c\u0430\u0442 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u043f\u043e\u0434\u0441\u0442\u0430\u043d\u043e\u0432\u043a\u0443 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445. \u042d\u0442\u043e\u0442 \u043c\u0435\u0445\u0430\u043d\u0438\u0437\u043c \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u0431\u043e\u043b\u0435\u0435 \u0433\u0438\u0431\u043a\u043e \u0443\u043f\u0440\u0430\u0432\u043b\u044f\u0442\u044c \u043a\u0430\u043a \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f, \u0442\u0430\u043a \u0438 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u044f\u043c\u0438 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u043e\u0432.
\u0422\u0430\u043a, \u043a \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u043e\u043d\u043d\u044b\u043c \u0444\u0430\u0439\u043b\u0430\u043c \u0434\u043e\u0431\u0430\u0432\u043b\u044f\u044e\u0442\u0441\u044f \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0435 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0441\u0447\u0438\u0442\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u0438\u0437 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u043e\u043a\u0440\u0443\u0436\u0435\u043d\u0438\u044f \u0438\u043b\u0438 \u0436\u0435 \u0437\u0430\u0434\u0430\u044e\u0442\u0441\u044f \u0432 \u044f\u0432\u043d\u043e \u0432\u0438\u0434\u0435 \u043f\u0440\u0438 \u0441\u0442\u0430\u0440\u0442\u0435 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f.
\u0414\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043e \u0437\u0430\u0434\u0430\u043d\u0438\u0438 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u0432 \u044f\u0432\u043d\u043e\u043c \u0432\u0438\u0434\u0435 \u043f\u0440\u0438 \u0441\u0442\u0430\u0440\u0442\u0435 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f, \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u0417\u0430\u043f\u0443\u0441\u043a \u041f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0439 Data Quality.
\u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u043e\u043a\u0440\u0443\u0436\u0435\u043d\u0438\u0435 (\u043c\u043e\u0436\u043d\u043e \u0442\u0430\u043a\u0436\u0435 \u0437\u0430\u0434\u0430\u0432\u0430\u0442\u044c \u0438 JVM-\u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435), \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e, \u0447\u0442\u043e\u0431\u044b \u0438\u0445 \u0438\u043c\u0435\u043d\u0430 \u0441\u043e\u0432\u043f\u0430\u0434\u0430\u043b\u0438 \u0441\u043e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u043c \u0440\u0435\u0433\u0443\u043b\u044f\u0440\u043d\u044b\u043c \u0432\u044b\u0440\u0430\u0436\u0435\u043d\u0438\u0435\u043c: ^(?i)(DQ)[a-z0-9_-]+$
, \u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440 DQ_STORAGE_PASSOWRD
\u0438\u043b\u0438 dqMattermostToken
. \u0412\u0441\u0435 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u043e\u043a\u0440\u0443\u0436\u0435\u043d\u0438\u044f, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0441\u043e\u0432\u043f\u0430\u0434\u0430\u044e\u0442 \u0441 \u044d\u0442\u0438\u043c \u0440\u0435\u0433\u0443\u043b\u044f\u0440\u043d\u044b\u043c \u0432\u044b\u0440\u0430\u0436\u0435\u043d\u0438\u0435\u043c \u0431\u0443\u0434\u0443\u0442 \u0441\u0447\u0438\u0442\u0430\u043d\u044b \u0438 \u0434\u043e\u0431\u0430\u0432\u043b\u0435\u043d\u044b \u0432 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u043e\u043d\u043d\u044b\u0435 \u0444\u0430\u0439\u043b\u044b \u0434\u043b\u044f \u043f\u043e\u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0435\u0439 \u043f\u043e\u0434\u0441\u0442\u0430\u043d\u043e\u0432\u043a\u0438 \u0432 \u043d\u0443\u0436\u043d\u044b\u0435 \u0440\u0430\u0437\u0434\u0435\u043b\u044b. \u041f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u0434\u043e\u0431\u0430\u0432\u043b\u044f\u044e\u0442\u0441\u044f \u043a\u0430\u043a \u0432 \u0444\u0430\u0439\u043b \u0441 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f, \u0442\u0430\u043a \u0438 \u0432 \u0444\u0430\u0439\u043b/\u044b \u0441 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0435\u0439 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u043e\u0432.
\u0422\u0438\u043f\u043e\u0432\u043e\u0439 \u043f\u0440\u0438\u043c\u0435\u0440 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u044f \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u043e\u043a\u0440\u0443\u0436\u0435\u043d\u0438\u044f - \u044d\u0442\u043e \u043f\u043e\u0434\u0441\u0442\u0430\u043d\u043e\u0432\u043a\u0430 \u0441\u0435\u043a\u0440\u0435\u0442\u043e\u0432 \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0432\u043d\u0435\u0448\u043d\u0438\u043c \u0441\u0438\u0441\u0442\u0435\u043c\u0430\u043c. \u0425\u0440\u0430\u043d\u0435\u043d\u0438\u0435 \u0442\u0430\u043a\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u0432 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u043e\u043d\u043d\u044b\u0445 \u0444\u0430\u0439\u043b\u0430\u0445 - \u044d\u0442\u043e \u043d\u0435 \u043e\u0447\u0435\u043d\u044c \u0445\u043e\u0440\u043e\u0448\u0430\u044f \u0438\u0434\u0435\u044f. \u041f\u043e\u044d\u0442\u043e\u043c\u0443 \u0432 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0435 Checkita \u0440\u0435\u0430\u043b\u0438\u0437\u043e\u0432\u0430\u043d \u043c\u0435\u0445\u0430\u043d\u0438\u0437\u043c \u0434\u043b\u044f \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u0442\u0430\u043a\u0438\u0445 \u0434\u0430\u043d\u043d\u044b\u0445 \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0438\u0441\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f.
\u0412\u0410\u0416\u041d\u041e \u041f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u0434\u043e\u0431\u0430\u0432\u043b\u044f\u044e\u0442\u0441\u044f \u0432 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u043e\u043d\u043d\u044b\u0435 \u0444\u0430\u0439\u043b\u044b \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0438\u0441\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f \u0438 \u043d\u0435 \u0441\u043e\u0445\u0440\u0430\u043d\u044f\u044e\u0442\u0441\u044f \u043d\u0438 \u0432 \u043a\u0430\u043a\u043e\u043c \u0432\u0438\u0434\u0435.
"},{"location":"ru/02-general-concepts/03-StatusModel/","title":"\u0421\u0442\u0430\u0442\u0443\u0441\u043d\u0430\u044f \u041c\u043e\u0434\u0435\u043b\u044c \u0420\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432","text":"\u0415\u0434\u0438\u043d\u0430\u044f \u0441\u0442\u0430\u0442\u0443\u0441\u043d\u0430\u044f \u043c\u043e\u0434\u0435\u043b\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043f\u043e\u043b\u0443\u0447\u0430\u044e\u0442\u0441\u044f \u0432 \u043f\u0440\u043e\u0446\u0435\u0441\u0441\u0435 \u0440\u0430\u0431\u043e\u0442\u044b \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430 Checkita. \u0420\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b \u0440\u0430\u0441\u0447\u0435\u0442\u0430 \u043a\u0430\u043a \u043c\u0435\u0442\u0440\u0438\u043a, \u0442\u0430\u043a \u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043e\u043a \u0438\u043c\u0435\u044e\u0442 \u043e\u0431\u0449\u0438\u0435 \u0438\u043d\u0434\u0438\u043a\u0430\u0442\u043e\u0440\u044b \u0438\u0445 \u0441\u0442\u0430\u0442\u0443\u0441\u043e\u0432, \u0430 \u0438\u043c\u0435\u043d\u043d\u043e:
Success
- \u0412\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u0435 \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0438\u043b\u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438 \u0437\u0430\u0432\u0435\u0440\u0448\u0438\u043b\u043e\u0441\u044c \u0431\u0435\u0437 \u043e\u0448\u0438\u0431\u043e\u043a \u0438 \u0443\u0441\u043b\u043e\u0432\u0438\u0435 \u0437\u0430\u0434\u0430\u043d\u043d\u043e\u0435 \u0432 \u043c\u0435\u0442\u0440\u0438\u043a\u0435 \u0438\u043b\u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0435 \u0432\u044b\u043f\u043e\u043b\u043d\u0435\u043d\u043e.Failure
- \u0412 \u0445\u043e\u0434\u0435 \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044f \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0438\u043b\u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438 \u0431\u044b\u043b\u0438 \u043f\u043e\u043b\u0443\u0447\u0435\u043d\u044b \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043d\u0435 \u0443\u0434\u043e\u0432\u043b\u0435\u0442\u0432\u043e\u0440\u044f\u044e\u0442 \u0443\u0441\u043b\u043e\u0432\u0438\u044e \u0434\u0430\u043d\u043d\u043e\u0439 \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0438\u043b\u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438. \u041d\u0430\u043f\u0440\u0438\u043c\u0435\u0440:regexMatch
\u043c\u0435\u0442\u0440\u0438\u043a\u0430 \u043f\u043e\u043b\u0443\u0447\u0438\u043b\u0430 \u043d\u0430 \u0432\u0445\u043e\u0434 \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435 \u043a\u043e\u043b\u043e\u043d\u043a\u0438, \u043a\u043e\u0442\u043e\u0440\u043e\u0435 \u043d\u0435 \u0441\u043e\u0432\u043f\u0430\u0434\u0430\u0435\u0442 \u0441 \u0443\u043a\u0430\u0437\u0430\u043d\u043d\u044b\u043c \u0440\u0435\u0433\u0443\u043b\u044f\u0440\u043d\u044b\u043c \u0432\u044b\u0440\u0430\u0436\u0435\u043d\u0438\u0435\u043c.Error
- \u041e\u0431\u043d\u0430\u0440\u0443\u0436\u0435\u043d\u0430 \u043e\u0448\u0438\u0431\u043a\u0430 \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0432\u044b\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f \u0440\u0430\u0441\u0447\u0435\u0442\u0430 \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0438\u043b\u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438. \u0422\u0430\u043a\u0436\u0435 \u043f\u0435\u0440\u0435\u0445\u0432\u0430\u0442\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u0435 \u043e\u0431 \u043e\u0448\u0438\u0431\u043a\u0435.\u0412\u043e \u0432\u0441\u0435\u0445 \u0442\u0438\u043f\u0430\u0445 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0441\u0442\u0430\u0442\u0443\u0441 \u0441\u043e\u043f\u0440\u043e\u0432\u043e\u0436\u0434\u0430\u0435\u0442\u0441\u044f \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u0435\u043c, \u043a\u043e\u0442\u043e\u0440\u043e\u0435 \u0435\u0433\u043e \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u0435\u0442. \u041e\u0434\u043d\u0430\u043a\u043e, \u0435\u0441\u0442\u044c \u0440\u0430\u0437\u043b\u0438\u0447\u0438\u044f \u0432 \u0442\u043e\u043c, \u043a\u0430\u043a \u0441\u0442\u0430\u0442\u0443\u0441\u044b \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u043f\u043e\u043b\u0443\u0447\u0435\u043d\u044b \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c:
Success
, \u0442\u043e \u0434\u043b\u044f \u044d\u0442\u043e\u0439 \u0441\u0442\u0440\u043e\u043a\u0438 \u0431\u0443\u0434\u0435\u0442 \u0437\u0430\u043f\u0438\u0441\u0430\u043d\u0430 \u043e\u0448\u0438\u0431\u043a\u0430 \u0440\u0430\u0441\u0447\u0435\u0442\u0430 \u0434\u0430\u043d\u043d\u043e\u0439 \u043c\u0435\u0442\u0440\u0438\u043a\u0438. \u0414\u0430\u043b\u0435\u0435, \u043c\u043e\u0436\u043d\u043e \u0437\u0430\u043f\u0440\u043e\u0441\u0438\u0442\u044c \u043e\u0442\u0447\u0435\u0442 \u043e\u0431 \u043e\u0448\u0438\u0431\u043a\u0430\u0445 \u043f\u0440\u0438 \u0440\u0430\u0441\u0447\u0435\u0442\u0435 \u043c\u0435\u0442\u0440\u0438\u043a, \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u044f \u043a\u043e\u0442\u043e\u0440\u043e\u0433\u043e \u043e\u043f\u0438\u0441\u0430\u043d\u0430 \u0432 \u0433\u043b\u0430\u0432\u0435 \u041e\u0442\u0447\u0435\u0442\u044b \u043e\u0431 \u043e\u0448\u0438\u0431\u043a\u0430\u0445. \u041f\u043e\u0434\u0440\u043e\u0431\u043d\u0435\u0435 \u043e \u0441\u0431\u043e\u0440\u0435 \u043e\u0448\u0438\u0431\u043e\u043a \u043f\u043e \u0440\u0430\u0441\u0447\u0435\u0442\u0443 \u043c\u0435\u0442\u0440\u0438\u043a, \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u0421\u0431\u043e\u0440 \u041e\u0448\u0438\u0431\u043e\u043a \u0412\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044f \u041c\u0435\u0442\u0440\u0438\u043a.\u0412\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u0435 \u043c\u0435\u0442\u0440\u0438\u043a \u0432\u043a\u043b\u044e\u0447\u0430\u0435\u0442 \u0432 \u0441\u0435\u0431\u044f \u043f\u0440\u043e\u0446\u0435\u0441\u0441 \u0447\u0442\u0435\u043d\u0438\u044f \u0434\u0430\u043d\u043d\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0430 \u0441\u0442\u0440\u043e\u043a\u0430 \u0437\u0430 \u0441\u0442\u0440\u043e\u043a\u043e\u0439 \u0438 \u043f\u0440\u0438\u0440\u0430\u0449\u0435\u043d\u0438\u0435 \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u044f \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0434\u043b\u044f \u043a\u0430\u0436\u0434\u043e\u0439 \u0441\u0442\u0440\u043e\u043a\u0438 (\u0432 \u0441\u043b\u0443\u0447\u0430\u0435, \u0435\u0441\u043b\u0438 \u0443\u0441\u043b\u043e\u0432\u0438\u0435 \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0432\u044b\u043f\u043e\u043b\u043d\u0435\u043d\u043e). \u0422\u0430\u043a, \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044f \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u044f \u043f\u0440\u0438\u0440\u0430\u0449\u0435\u043d\u0438\u044f \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0434\u043b\u044f \u0442\u0435\u043a\u0443\u0449\u0435\u0439 \u0441\u0442\u0440\u043e\u043a\u0438 \u0447\u0442\u043e-\u0442\u043e \u043c\u043e\u0436\u0435\u0442 \u043f\u043e\u0439\u0442\u0438 \u043d\u0435 \u0442\u0430\u043a: \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u043a\u0430\u043a\u0438\u0435-\u043b\u0438\u0431\u043e \u043f\u0440\u043e\u0431\u043b\u0435\u043c\u044b \u0441 \u0434\u0430\u043d\u043d\u044b\u043c\u0438 \u0438\u043b\u0438 \u043e\u0448\u0438\u0431\u043a\u0438 \u0438\u0441\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f. \u0412 \u0434\u043e\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u0435, \u043c\u043d\u043e\u0433\u0438\u0435 \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0438\u043c\u0435\u044e\u0442 \u043b\u043e\u0433\u0438\u0447\u0435\u0441\u043a\u043e\u0435 \u0443\u0441\u043b\u043e\u0432\u0438\u0435 \u043a\u043e\u0442\u043e\u0440\u043e\u0435 \u0434\u043e\u043b\u0436\u043d\u043e \u0431\u044b\u0442\u044c \u0432\u044b\u043f\u043e\u043b\u043d\u0435\u043d\u043e, \u0447\u0442\u043e\u0431\u044b \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435 \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0431\u044b\u043b\u043e \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u043e. \u0415\u0441\u043b\u0438 \u044d\u0442\u043e \u0443\u0441\u043b\u043e\u0432\u0438\u0435 \u043d\u0435 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0435\u0442\u0441\u044f, \u0442\u043e \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u0435 \u043f\u0440\u0438\u0440\u0430\u0449\u0435\u043d\u0438\u044f \u0434\u043b\u044f \u0442\u0435\u043a\u0443\u0449\u0435\u0439 \u0441\u0442\u0440\u043e\u043a\u0438 \u0442\u0430\u043a\u0436\u0435 \u0440\u0430\u0441\u0441\u043c\u0430\u0442\u0440\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u043a\u0430\u043a \u043d\u0435 \u0432\u044b\u043f\u043e\u043b\u043d\u0435\u043d\u043d\u043e\u0435.
\u0422\u0430\u043a\u0438\u043c \u043e\u0431\u0440\u0430\u0437\u043e\u043c, \u0432 \u0432\u044b\u0448\u0435\u043e\u043f\u0438\u0441\u0430\u043d\u043d\u044b\u0445 \u0441\u0438\u0442\u0443\u0430\u0446\u0438\u044f\u0445 \u0431\u0443\u0434\u0435\u0442 \u0437\u0430\u0434\u0435\u0439\u0441\u0442\u0432\u043e\u0432\u0430\u043d \u043c\u0435\u0445\u0430\u043d\u0438\u0437\u043c \u0441\u0431\u043e\u0440\u043a\u0430 \u043e\u0448\u0438\u0431\u043e\u043a \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044f \u043c\u0435\u0442\u0440\u0438\u043a \u0438 \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0430\u044f \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u044f \u043e\u0431 \u043e\u0448\u0438\u0431\u043a\u0435 \u0431\u0443\u0434\u0435\u0442 \u0437\u0430\u043f\u0438\u0441\u0430\u043d\u0430:
Failure
, \u043b\u0438\u0431\u043e Error
) \u0438 \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u0435 \u0441 \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u0435\u043c \u043e\u0448\u0438\u0431\u043a\u0438.\u041f\u043e\u0441\u043a\u043e\u043b\u044c\u043a\u0443 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0438 \u0434\u0430\u043d\u043d\u044b\u0445, \u043d\u0430\u0434 \u043a\u043e\u0442\u043e\u0440\u044b\u043c\u0438 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u044e\u0442\u0441\u044f \u0440\u0430\u0441\u0447\u0435\u0442\u044b, \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u043e\u0447\u0435\u043d\u044c \u0431\u043e\u043b\u044c\u0448\u0438\u043c, \u0438, \u043a\u0430\u043a \u0441\u043b\u0435\u0434\u0441\u0442\u0432\u0438\u0435, \u043c\u043e\u0433\u0443\u0442 \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u0434\u0438\u0442\u044c \u0437\u043d\u0430\u0447\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435 \u0447\u0438\u0441\u043b\u043e \u043e\u0448\u0438\u0431\u043e\u043a \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044f \u043c\u0435\u0442\u0440\u0438\u043a, \u0442\u043e \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0443\u0435\u0442 \u0440\u0438\u0441\u043a \u043f\u0435\u0440\u0435\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f \u043f\u0430\u043c\u044f\u0442\u0438 \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0437\u0430\u043f\u0438\u0441\u0438 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043e\u0431 \u044d\u0442\u0438\u0445 \u043e\u0448\u0438\u0431\u043a\u0430\u0445. \u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u044d\u0442\u043e \u043f\u0440\u0435\u0434\u043e\u0442\u0432\u0440\u0430\u0442\u0438\u0442\u044c, \u043a\u043e\u043b\u0438\u0447\u0435\u0441\u0442\u0432\u043e \u043e\u0448\u0438\u0431\u043e\u043a, \u043a\u043e\u0442\u043e\u0440\u043e\u0435 \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u0441\u043e\u0431\u0440\u0430\u043d\u043e \u0434\u043b\u044f \u043a\u0430\u0436\u0434\u043e\u0439 \u043c\u0435\u0442\u0440\u0438\u043a\u0438, \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0435\u043d\u043e \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 10000 \u043e\u0448\u0438\u0431\u043e\u043a. \u0412 \u0441\u043b\u0443\u0447\u0430\u0435 \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e\u0441\u0442\u0438 \u044d\u0442\u043e \u0447\u0438\u0441\u043b\u043e \u043c\u043e\u0436\u043d\u043e \u0443\u043c\u0435\u043d\u044c\u0448\u0438\u0442\u044c, \u0443\u0441\u0442\u0430\u043d\u043e\u0432\u0438\u0432 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0432\u0443\u044e\u0449\u0435\u0435 \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0435\u043d\u0438\u0435 \u0432 \u0444\u0430\u0439\u043b\u0435 \u0441 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u0432 \u043f\u043e\u043b\u0435 errorDumpSize
. \u0421\u043c. \u0433\u043b\u0430\u0432\u0443 \u0410\u043a\u0442\u0438\u0432\u0430\u0442\u043e\u0440\u044b.
\u0421\u043e\u0431\u0440\u0430\u043d\u043d\u044b\u0435 \u043e\u0448\u0438\u0431\u043a\u0438 \u043f\u043e \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044e \u043c\u0435\u0442\u0440\u0438\u043a \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u044b, \u0447\u0442\u043e\u0431\u044b \u043d\u0430\u0445\u043e\u0434\u0438\u0442\u044c \u0438 \u0438\u0441\u043f\u0440\u0430\u0432\u043b\u044f\u0442\u044c \u043f\u0440\u043e\u0431\u043b\u0435\u043c\u044b \u0432 \u0434\u0430\u043d\u043d\u044b\u0445. \u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u043f\u043e\u043b\u0443\u0447\u0438\u0442\u044c \u044d\u0442\u0438 \u043e\u0448\u0438\u0431\u043a\u0438, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u043d\u0430\u0441\u0442\u0440\u043e\u0438\u0442\u044c \u0441\u043e\u0445\u0440\u0430\u043d\u0435\u043d\u0438\u0435 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0443\u044e\u0449\u0438\u0445 \u043e\u0442\u0447\u0435\u0442\u043e\u0432 \u043f\u043e \u0437\u0430\u0432\u0435\u0440\u0448\u0435\u043d\u0438\u0438 \u0440\u0430\u0431\u043e\u0442\u044b \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430. \u0417\u0430\u043f\u0440\u043e\u0441 \u043d\u0430 \u0441\u0431\u043e\u0440 \u043e\u0442\u0447\u0435\u0442\u043e\u0432 \u043e\u0431 \u043e\u0448\u0438\u0431\u043a\u0430\u0445 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0435\u0442\u0441\u044f \u0440\u0430\u0437\u0434\u0435\u043b\u0435 targets
\u0444\u0430\u0439\u043b\u0430 \u0441 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u044f\u043c\u0438 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430, \u043a\u0430\u043a \u044d\u0442\u043e \u043e\u043f\u0438\u0441\u0430\u043d\u043e \u0432 \u0433\u043b\u0430\u0432\u0435 \u041e\u0442\u0447\u0435\u0442\u044b \u043e\u0431 \u043e\u0448\u0438\u0431\u043a\u0430\u0445. \u0421\u0442\u043e\u0438\u0442 \u0442\u0430\u043a\u0436\u0435 \u0437\u0430\u043c\u0435\u0442\u0438\u0442\u044c, \u0447\u0442\u043e \u043e\u0442\u0447\u0435\u0442\u044b \u0431\u0443\u0434\u0443\u0442 \u0441\u043e\u0434\u0435\u0440\u0436\u0430\u0442\u044c \u0432\u044b\u0431\u043e\u0440\u043a\u0438 \u0434\u0430\u043d\u043d\u044b\u0445 \u0438\u0437 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432, \u043f\u043e\u044d\u0442\u043e\u043c\u0443 \u0441\u043b\u0435\u0434\u0443\u0435\u0442 \u0432\u043d\u0438\u043c\u0430\u0442\u0435\u043b\u044c\u043d\u043e \u043e\u0442\u043d\u043e\u0441\u0438\u0442\u044c\u0441\u044f \u043a \u0442\u043e\u043c\u0443, \u043a\u0442\u043e \u0431\u0443\u0434\u0435\u0442 \u0438\u043c\u0435\u0442\u044c \u0434\u043e\u0441\u0442\u0443\u043f \u043a \u044d\u0442\u0438\u043c \u043e\u0442\u0447\u0435\u0442\u0430\u043c. \u041f\u043e \u044d\u0442\u0438\u043c \u0436\u0435 \u0441\u043e\u043e\u0431\u0440\u0430\u0436\u0435\u043d\u0438\u044f\u043c, \u043e\u0442\u0447\u0435\u0442\u044b \u043e\u0431 \u043e\u0448\u0438\u0431\u043a\u0430\u0445 \u043d\u0435 \u0441\u043e\u0445\u0440\u0430\u043d\u044f\u044e\u0442\u0441\u044f \u0432 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430.
\u0412\u0410\u0416\u041d\u041e \u0424\u0443\u043d\u043a\u0446\u0438\u043e\u043d\u0430\u043b \u0441\u0432\u044f\u0437\u0430\u043d\u043d\u044b\u0439 \u0441 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u043e\u0439 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0430 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432 \u0434\u0430\u043d\u043d\u044b\u0445 \u043d\u0430 \u0434\u0430\u043d\u043d\u044b\u0439 \u043c\u043e\u043c\u0435\u043d\u0442 \u043d\u0430\u0445\u043e\u0434\u0438\u0442\u0441\u044f \u0432 \u044d\u043a\u0441\u043f\u0435\u0440\u0438\u043c\u0435\u043d\u0442\u0430\u043b\u044c\u043d\u043e\u0439 \u0441\u0442\u0430\u0434\u0438\u0438, \u043f\u043e\u044d\u0442\u043e\u043c\u0443 \u0432 \u043d\u0435\u043c \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u044b \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u0438\u044f.
\u041a\u0430\u043a \u0443\u0436\u0435 \u0431\u044b\u043b\u043e \u0441\u043a\u0430\u0437\u0430\u043d\u043e, \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a Checkita \u0441\u043f\u043e\u0441\u043e\u0431\u0435\u043d \u0440\u0430\u0441\u0441\u0447\u0438\u0442\u044b\u0432\u0430\u0442\u044c \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0438 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0442\u044c \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0430 \u043d\u0430\u0434 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u043c\u0438 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0430\u043c\u0438 \u0434\u0430\u043d\u043d\u044b\u0445. \u0422\u0430\u043a \u043a\u0430\u043a Spark \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0432 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0435 \u0432\u044b\u0447\u0438\u0441\u043b\u0438\u0442\u0435\u043b\u044c\u043d\u043e \u044f\u0434\u0440\u0430, \u0442\u043e Spark Structured Streaming API \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044f \u043c\u0435\u0442\u0440\u0438\u043a \u043d\u0430\u0434 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u043c\u0438 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0430\u043c\u0438 \u0434\u0430\u043d\u043d\u044b\u0445.
\u041e\u0441\u043d\u043e\u0432\u043d\u0430\u044f \u0438\u0434\u0435\u044f \u043f\u0440\u0438 \u0437\u0430\u043f\u0443\u0441\u043a\u0435 data quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430 \u0432 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u043e\u043c \u0440\u0435\u0436\u0438\u043c\u0435, \u044d\u0442\u043e \u0441\u043e\u0445\u0440\u0430\u043d\u0438\u0442\u044c \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0442\u044c \u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u043e \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432 \u0434\u0430\u043d\u043d\u044b\u0445 \u043e\u0434\u043d\u043e\u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e. \u0422\u0430\u043a \u043a\u0430\u043a \u0440\u0430\u0441\u0447\u0435\u0442 \u043c\u0435\u0442\u0440\u0438\u043a \u043d\u0430\u0434 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0430\u043c\u0438 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u044f\u0435\u0442 \u0441\u043e\u0431\u043e\u0439 \u043e\u043f\u0435\u0440\u0430\u0446\u0438\u044e \u0441 \u0441\u043e\u0445\u0440\u0430\u043d\u0435\u043d\u0438\u0435\u043c \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u044f, \u0442\u043e \u0432\u0441\u0435 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0435 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0438 \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u0432 \"\u043e\u043a\u043e\u043d\u043d\u043e\u043c\" \u0440\u0435\u0436\u0438\u043c\u0435: \u0444\u043e\u0440\u043c\u0438\u0440\u0443\u044e\u0442\u0441\u044f \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u043e\u043a\u043d\u0430, \u043e\u0442\u0441\u043b\u0435\u0436\u0438\u0432\u0430\u044e\u0449\u0438\u0435 \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u0435 \u0437\u0430 \u0434\u0430\u043d\u043d\u044b\u0439 \u043f\u0440\u043e\u043c\u0435\u0436\u0443\u0442\u043e\u043a \u0432\u0440\u0435\u043c\u0435\u043d\u0438. \u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u043e\u0434\u043d\u043e\u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0442\u044c \u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u043e \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432, \u0438\u0445 \u043e\u043a\u043d\u0430 \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u0441\u0438\u043d\u0445\u0440\u043e\u043d\u0438\u0437\u0438\u0440\u043e\u0432\u0430\u043d\u044b: (1) \u043e\u043d\u0438 \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u043e\u0434\u043d\u043e\u0433\u043e \u0440\u0430\u0437\u043c\u0435\u0440\u0430 \u0438 (2) \u0434\u043e\u043b\u0436\u043d\u044b \u043d\u0430\u0447\u0438\u043d\u0430\u0442\u044c\u0441\u044f \u0432 \u043e\u0434\u043d\u043e \u0438 \u0442\u043e \u0436\u0435 \u0432\u0440\u0435\u043c\u044f. \u0427\u0442\u043e\u0431\u044b \u044d\u0442\u043e \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0438\u0442\u044c, \u0440\u0430\u0437\u043c\u0435\u0440 \u043e\u043a\u043e\u043d \u0443\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u0438 \u044f\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u0435\u0434\u0438\u043d\u044b\u043c \u0434\u043b\u044f \u0432\u0441\u0435\u0445 \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0435\u043c\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432 \u0434\u0430\u043d\u043d\u044b\u0445.
\u041f\u043e\u0441\u043a\u043e\u043b\u044c\u043a\u0443 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0435 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0438 \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u043f\u043e \u043a\u0430\u0436\u0434\u043e\u043c\u0443 \u043e\u043a\u043d\u0443, \u043a\u043b\u044e\u0447\u0435\u0432\u044b\u043c \u043c\u043e\u043c\u0435\u043d\u0442\u043e\u043c \u044f\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u043d\u0430\u043b\u0438\u0447\u0438\u0435 \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0439 \u043c\u0435\u0442\u043a\u0438 \u0434\u043b\u044f \u043a\u0430\u0436\u0434\u043e\u0439 \u0437\u0430\u043f\u0438\u0441\u0438, \u043a\u043e\u0442\u043e\u0440\u0430\u044f \u0431\u0443\u0434\u0435\u0442 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0430, \u0447\u0442\u043e\u0431\u044b \u043f\u043e\u043c\u0435\u0441\u0442\u0438\u0442\u044c \u044d\u0442\u0443 \u0437\u0430\u043f\u0438\u0441\u044c \u0432 \u0442\u043e \u0438\u043b\u0438 \u0438\u043d\u043e\u0435 \u043e\u043a\u043d\u043e. \u041f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u043e \u043e\u043f\u0446\u0438\u0439, \u0447\u0442\u043e\u0431\u044b \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u0438\u0442\u044c \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u0443\u044e \u043c\u0435\u0442\u043a\u0443:
Processing time
- Spark \u0444\u043e\u0440\u043c\u0438\u0440\u0443\u0435\u0442 \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u0443\u044e \u043c\u0435\u0442\u043a\u0443 \u0430\u0432\u0442\u043e\u043c\u0430\u0442\u0438\u0447\u0435\u0441\u043a\u0438 \u0434\u043b\u044f \u043a\u0430\u0436\u0434\u043e\u0439 \u0437\u0430\u043f\u0438\u0441\u0438, \u043a\u043e\u0433\u0434\u0430 \u043e\u043d\u0430 \u043f\u043e\u0441\u0442\u0443\u043f\u0430\u0435\u0442 \u0432 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0443. \u0414\u043b\u044f \u044d\u0442\u043e\u0433\u043e \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0444\u0443\u043d\u043a\u0446\u0438\u044f current_timestamp
.Event time
- \u0412 \u0431\u043e\u043b\u044c\u0448\u0435\u0439 \u0441\u0442\u0435\u043f\u0435\u043d\u0438 \u043f\u0440\u0438\u043c\u0435\u043d\u0438\u043c\u043e \u043a \u0442\u043e\u043f\u0438\u043a\u0430\u043c Kafka: \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u0430\u044f \u043c\u0435\u0442\u043a\u0430 \u0441\u0447\u0438\u0442\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u0438\u0437 \u043a\u043e\u043b\u043e\u043d\u043a\u0438 timestamp
, \u0432 \u043a\u043e\u0442\u043e\u0440\u043e\u0439 \u0445\u0440\u0430\u043d\u0438\u0442\u0441\u044f \u0432\u0440\u0435\u043c\u044f \u0441\u043e\u0437\u0434\u0430\u043d\u0438\u044f \u0437\u0430\u043f\u0438\u0441\u0438 (event time).Custom time
- \u041e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u0430\u044f \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u043a\u043e\u043b\u043e\u043d\u043a\u0430 \u0442\u0438\u043f\u0430 timestamp, \u0438\u0437 \u043a\u043e\u0442\u043e\u0440\u043e\u0439 \u0431\u0443\u0434\u0435\u0442 \u0441\u0447\u0438\u0442\u0430\u043d\u0430 \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u0430\u044f \u043c\u0435\u0442\u043a\u0430.\u0422\u0430\u043a\u0436\u0435 \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0432\u044b\u044f\u0441\u043d\u0438\u0442\u044c \u0442\u043e, \u043a\u043e\u0433\u0434\u0430 \u043c\u043e\u0436\u043d\u043e \u0441\u0447\u0438\u0442\u0430\u0442\u044c \u043a\u0430\u043a\u043e\u0435-\u043b\u0438\u0431\u043e \u043e\u043a\u043d\u043e \u043f\u043e\u043b\u043d\u043e\u0441\u0442\u044c\u044e \u0441\u0444\u043e\u0440\u043c\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u043c. \u0414\u0440\u0443\u0433\u0438\u043c\u0438 \u0441\u043b\u043e\u0432\u0430\u043c\u0438, \u043d\u0430\u0434\u043e \u0443\u0441\u0442\u0430\u043d\u043e\u0432\u0438\u0442\u044c \u043f\u0440\u0430\u0432\u0438\u043b\u0430, \u0441\u043e\u0433\u043b\u0430\u0441\u043d\u043e \u043a\u043e\u0442\u043e\u0440\u044b\u043c \u043c\u043e\u0436\u043d\u043e \u0431\u0443\u0434\u0435\u0442 \u0441\u0447\u0438\u0442\u0430\u0442\u044c \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u0435 \u043e\u043a\u043d\u0430 \u043e\u043a\u043e\u043d\u0447\u0430\u0442\u0435\u043b\u044c\u043d\u044b\u043c\u0438 \u0438 \u043f\u0440\u0435\u0434\u043f\u043e\u043b\u0430\u0433\u0430\u0442\u044c, \u0447\u0442\u043e \u043d\u0438\u043a\u0430\u043a\u0438\u0435 \u0434\u0440\u0443\u0433\u0438\u0435 \u0437\u0430\u043f\u0438\u0441\u0438 \u0431\u043e\u043b\u044c\u0448\u0435 \u043d\u0435 \u043f\u043e\u043f\u0430\u0434\u0443\u0442 \u0432 \u044d\u0442\u043e \u043e\u043a\u043d\u043e. \u0420\u0430\u0441\u043f\u0440\u043e\u0441\u0442\u0440\u0430\u043d\u0435\u043d\u043d\u044b\u043c \u043f\u043e\u0434\u0445\u043e\u0434\u043e\u043c \u0434\u043b\u044f \u0440\u0435\u0448\u0435\u043d\u0438\u044f \u044d\u0442\u043e\u0439 \u043f\u0440\u043e\u0431\u043b\u0435\u043c\u044b \u043f\u0440\u0438 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u043e\u0439 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 \u044f\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0435 \u0442\u0430\u043a \u043d\u0430\u0437\u044b\u0432\u0430\u0435\u043c\u044b\u0445 \"\u0432\u043e\u0434\u044f\u043d\u044b\u0445 \u0437\u043d\u0430\u043a\u043e\u0432\" (watermarks). \u0412\u043e\u0434\u044f\u043d\u043e\u0439 \u0437\u043d\u0430\u043a \u0441\u043e\u0434\u0435\u0440\u0436\u0438\u0442 \u0432 \u0441\u0435\u0431\u0435 \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u0443\u044e \u043c\u0435\u0442\u043a\u0443 \u043a\u043e\u0442\u043e\u0440\u0430\u044f \u0443\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442 \u0443\u0440\u043e\u0432\u0435\u043d\u044c \u0434\u043b\u044f \u043f\u0440\u0438\u043d\u044f\u0442\u0438\u044f \u043d\u043e\u0432\u044b\u0445 \u0437\u0430\u043f\u0438\u0441\u0435\u0439 \u0432 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0443. \u0415\u0441\u043b\u0438 \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u0430\u044f \u043c\u0435\u0442\u043a\u0430 \u0437\u0430\u043f\u0438\u0441\u0438 \"\u043d\u0438\u0436\u0435\" \u0443\u0440\u043e\u0432\u043d\u044f \"\u0432\u043e\u0434\u044f\u043d\u043e\u0433\u043e \u0437\u043d\u0430\u043a\u0430\", \u0442\u043e \u0434\u0430\u043d\u043d\u0430\u044f \u0437\u0430\u043f\u0438\u0441\u044c \u0441\u0447\u0438\u0442\u0430\u0435\u0442\u0441\u044f \"\u043e\u043f\u043e\u0437\u0434\u0430\u0432\u0448\u0435\u0439\" \u0438 \u043d\u0435 \u043f\u0440\u0438\u043d\u0438\u043c\u0430\u0435\u0442\u0441\u044f \u0432 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0443. \"\u0412\u043e\u0434\u044f\u043d\u043e\u0439 \u0437\u043d\u0430\u043a\" \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u0442\u0441\u044f \u043a\u0430\u043a \u043c\u0430\u043a\u0441\u0438\u043c\u0430\u043b\u044c\u043d\u043e\u0435 \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435 \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0439 \u043c\u0435\u0442\u043a\u0438 \u0443 \u0443\u0436\u0435 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u0430\u043d\u043d\u044b\u0445 \u0437\u0430\u043f\u0438\u0441\u0435\u0439 \u0437\u0430 \u0432\u044b\u0447\u0435\u0442\u043e\u043c \u0437\u0430\u0440\u0430\u043d\u0435\u0435 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u043d\u043e\u0433\u043e \u0441\u043c\u0435\u0449\u0435\u043d\u0438\u044f. \u0411\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u0430\u044f \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u044f \u043e\u0431 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0438 \u043f\u043e\u0434\u0445\u043e\u0434\u0430 \"\u0432\u043e\u0434\u044f\u043d\u044b\u0445 \u0437\u043d\u0430\u043a\u043e\u0432\" \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0430 \u0432 \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u0430\u0446\u0438\u0438 Spark: Handling Late Data and Watermarking. \u0422\u0430\u043a, \u0434\u043b\u044f \u0446\u0435\u043b\u0438 \u0441\u0438\u043d\u0445\u0440\u043e\u043d\u0438\u0437\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u043e\u0439 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0438 \u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u0438\u0445 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432, \u0443\u0440\u043e\u0432\u0435\u043d\u044c \u0441\u043c\u0435\u0449\u0435\u043d\u0438\u044f \u0434\u043b\u044f \u0432\u043e\u0434\u044f\u043d\u043e\u0433\u043e \u0437\u043d\u0430\u043a\u0430 \u0443\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u0438 \u043e\u0434\u0438\u043d\u0430\u043a\u043e\u0432 \u0434\u043b\u044f \u0432\u0441\u0435\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432.
\u041d\u0430\u043a\u043e\u043d\u0435\u0446, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u043e\u0442\u043c\u0435\u0442\u0438\u0442\u044c, \u0447\u0442\u043e \u0434\u0432\u0438\u0436\u043e\u043a Spark Structure Streaming \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0435 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0438 \u0432 \u0440\u0435\u0436\u0438\u043c\u0435 \u043c\u0438\u043a\u0440\u043e-\u043f\u0430\u043a\u0435\u0442\u043e\u0432 (micro-batches). \u0422\u0430\u043a, \u0437\u0430\u043f\u0438\u0441\u0438 \u0441\u043e\u0431\u0438\u0440\u0430\u044e\u0442\u0441\u044f \u0437\u0430 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u043d\u044b\u0439 (\u043a\u0430\u043a \u043f\u0440\u0430\u0432\u0438\u043b\u043e, \u043e\u0447\u0435\u043d\u044c \u043a\u043e\u0440\u043e\u0442\u043a\u0438\u0439) \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b \u0432\u0440\u0435\u043c\u0435\u043d\u0438 \u0438 \u0434\u0430\u043b\u0435\u0435 \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u043a\u0430\u043a \u0441\u0442\u0430\u0442\u0438\u0447\u043d\u044b\u0439 \u0434\u0430\u0442\u0430\u0444\u0440\u0435\u0439\u043c. Spark \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u043d\u0430\u0441\u0442\u0440\u0430\u0438\u0432\u0430\u0442\u044c \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0439 \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b \u0437\u0430 \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0431\u0443\u0434\u0443\u0442 \u0441\u043e\u0431\u0438\u0440\u0430\u0442\u044c\u0441\u044f \u0437\u0430\u043f\u0438\u0441\u0438 \u0432 \u043c\u0438\u043a\u0440\u043e-\u043f\u0430\u043a\u0435\u0442. \u0414\u043b\u044f \u044d\u0442\u043e\u0433\u043e \u0437\u0430\u0434\u0430\u0435\u0442\u0441\u044f trigger
\u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b. \u0414\u0430\u043d\u043d\u044b\u0445 \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b \u0442\u0430\u043a\u0436\u0435 \u0434\u043e\u043b\u0436\u0435\u043d \u0431\u044b\u0442\u044c \u0435\u0434\u0438\u043d\u044b\u043c \u0434\u043b\u044f \u0432\u0441\u0435\u0445 \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0435\u043c\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432 \u0434\u0430\u043d\u043d\u044b\u0445 \u0438 \u0443\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f. \u041d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430 \u044d\u0442\u043e\u0433\u043e \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b\u0430 \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u043a\u043e\u043d\u0442\u0440\u043e\u043b\u0438\u0440\u043e\u0432\u0430\u0442\u044c \u0440\u0430\u0437\u043c\u0435\u0440 \u043c\u0438\u043a\u0440\u043e-\u043f\u0430\u043a\u0435\u0442\u043e\u0432 \u0434\u0430\u043d\u043d\u044b\u0445 \u0438, \u043a\u0430\u043a \u0441\u043b\u0435\u0434\u0441\u0442\u0432\u0438\u0435, \u043d\u0430\u0433\u0440\u0443\u0437\u043a\u0443 \u043d\u0430 \u044d\u043a\u0437\u0435\u043a\u044c\u044e\u0442\u043e\u0440\u044b.
\u0414\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043e \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u0445 \u0434\u043b\u044f \u0437\u0430\u043f\u0443\u0441\u043a\u0430 data quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u043e\u0432 \u0432 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u043e\u043c \u0440\u0435\u0436\u0438\u043c\u0435, \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u041d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0438 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u043e\u0439 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0438 chapter. \u041f\u043e\u0434\u044b\u0442\u043e\u0436\u0438\u0432, \u0440\u0430\u0431\u043e\u0442\u0430 data quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430 \u0432 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u043e\u043c \u0440\u0435\u0436\u0438\u043c\u0435 \u0441\u043e\u0441\u0442\u043e\u0438\u0442 \u0438\u0437 \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0445 \u044d\u0442\u0430\u043f\u043e\u0432:
forEachBatch sink
.\u041e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0430 \u0434\u0430\u043d\u043d\u044b\u0445 \u043a\u0430\u0436\u0434\u043e\u0433\u043e \u043c\u0438\u043a\u0440\u043e-\u043f\u0430\u043a\u0435\u0442\u0430 (\u0444\u043e\u0440\u043c\u0438\u0440\u0443\u0435\u0442\u0441\u044f \u0440\u0430\u0437 \u0437\u0430 trigger
\u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b):
\u041f\u0440\u043e\u0446\u0435\u0441\u0441\u043e\u0440 \u043e\u043a\u043e\u043d \u043f\u0440\u043e\u0432\u0435\u0440\u044f\u0435\u0442 \u0431\u0443\u0444\u0435\u0440 (\u0442\u0430\u043a\u0436\u0435 \u043e\u0434\u0438\u043d \u0440\u0430\u0437 \u0437\u0430 trigger
\u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b) \u043d\u0430 \u043d\u0430\u043b\u0438\u0447\u0438\u0435 \u043e\u043a\u043e\u043d, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043f\u043e\u043b\u043d\u043e\u0441\u0442\u044c\u044e \u0441\u0444\u043e\u0440\u043c\u0438\u0440\u043e\u0432\u0430\u043d\u044b, \u0442.\u0435. \u043d\u0430\u0445\u043e\u0434\u044f\u0442\u0441\u044f \u0446\u0435\u043b\u0438\u043a\u043e\u043c \u043d\u0438\u0436\u0435 \u0443\u0440\u043e\u0432\u043d\u044f \"\u0432\u043e\u0434\u044f\u043d\u043e\u0433\u043e \u0437\u043d\u0430\u043a\u0430\". \u0412\u0410\u0416\u041d\u041e \u0414\u043b\u044f \u0442\u043e\u0433\u043e, \u0447\u0442\u043e\u0431\u044b \u0438\u043c\u0435\u0442\u044c \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0442\u044c \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0435 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0438 \u0441\u0438\u043d\u0445\u0440\u043e\u043d\u043d\u043e, \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u043c\u0438\u043d\u0438\u043c\u0430\u043b\u044c\u043d\u043e\u0435 \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435 \"\u0432\u043e\u0434\u044f\u043d\u043e\u0433\u043e \u0437\u043d\u0430\u043a\u0430\" (\u0432\u044b\u0447\u0438\u0441\u043b\u044f\u0435\u0442\u0441\u044f \u043d\u0430 \u043e\u0441\u043d\u043e\u0432\u0435 \u0442\u0435\u043a\u0443\u0449\u0438\u0445 \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0439 \"\u0432\u043e\u0434\u044f\u043d\u044b\u0445 \u0437\u043d\u0430\u043a\u043e\u0432\" \u0432\u0441\u0435\u0445 \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0435\u043c\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432). \u0422\u0430\u043a\u043e\u0439 \u043f\u043e\u0434\u0445\u043e\u0434 \u0433\u0430\u0440\u0430\u043d\u0442\u0438\u0440\u0443\u0435\u0442, \u0447\u0442\u043e \u043e\u043a\u043d\u043e \u0431\u0443\u0434\u0435\u0442 \u043f\u043e\u043b\u043d\u043e\u0441\u0442\u044c\u044e \u0441\u0444\u043e\u0440\u043c\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u043c \u0434\u043b\u044f \u0432\u0441\u0435\u0445 \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0435\u043c\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432.
\u041a\u0430\u043a \u0442\u043e\u043b\u044c\u043a\u043e \u043f\u043e\u043b\u0443\u0447\u0435\u043d\u043e \u043f\u043e\u043b\u043d\u043e\u0441\u0442\u044c\u044e \u0441\u0444\u043e\u0440\u043c\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u043e\u0435 \u043e\u043a\u043d\u043e, \u0442\u043e \u0434\u043b\u044f \u043d\u0435\u0433\u043e \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u044e\u0442\u0441\u044f \u0432\u0441\u0435 \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u044b\u0435 \u043f\u0440\u043e\u0446\u0435\u0434\u0443\u0440\u044b:
Streaming queries \u0438 \u043f\u0440\u043e\u0446\u0435\u0441\u0441\u043e\u0440 \u043e\u043a\u043e\u043d \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u044e\u0442\u0441\u044f \u0434\u043e \u0442\u0435\u0445 \u043f\u043e\u0440, \u043f\u043e\u043a\u0430 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u043d\u0435 \u0431\u0443\u0434\u0435\u0442 \u043e\u0441\u0442\u0430\u043d\u043e\u0432\u043b\u0435\u043d\u043e (\u043f\u043e\u043b\u0443\u0447\u0435\u043d \u0441\u0438\u0433\u043d\u0430\u043b sigterm
) \u0438\u043b\u0438 \u0436\u0435 \u043f\u043e\u043a\u0430 \u043d\u0435 \u0441\u043b\u0443\u0447\u0438\u0442\u0441\u044f \u043a\u0430\u043a\u0430\u044f-\u043b\u0438\u0431\u043e \u043e\u0448\u0438\u0431\u043a\u0430.
\u0412\u0430\u0436\u043d\u043e \u0437\u0430\u043c\u0435\u0447\u0430\u043d\u0438\u0435 \u043e \u0441\u043e\u0445\u0440\u0430\u043d\u0435\u043d\u0438\u0438 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0432 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435: \u043f\u043e\u0441\u043a\u043e\u043b\u044c\u043a\u0443 \u043d\u0430\u0431\u043e\u0440 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0444\u043e\u0440\u043c\u0438\u0440\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u043a\u0430\u0436\u0434\u043e\u0433\u043e \u043e\u043a\u043d\u0430, \u0442\u043e referenceDate
\u0438 executionDate
\u0443\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u044e\u0442\u0441\u044f \u0440\u0430\u0432\u043d\u044b\u043c\u0438 \u0434\u0430\u0442\u0435 \u0438 \u0432\u0440\u0435\u043c\u0435\u043d\u0438 \u0441\u0442\u0430\u0440\u0442\u0430 \u0434\u0430\u043d\u043d\u043e\u0433\u043e \u043e\u043a\u043d\u0430. \u0414\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043f\u043e \u0440\u0430\u0431\u043e\u0442\u0435 \u0441 \u0434\u0430\u0442\u0430\u043c\u0438 \u0432\u043e \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0435, \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u0420\u0430\u0431\u043e\u0442\u0430 \u0441 \u0414\u0430\u0442\u0430\u043c\u0438.
\u0421\u041e\u0412\u0415\u0422 \u041f\u043e\u0441\u043a\u043e\u043b\u044c\u043a\u0443 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0430 \u0434\u0430\u043d\u043d\u044b\u0445 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u044e\u0442\u0441\u044f \u0434\u043b\u044f \u043a\u0430\u0436\u0434\u043e\u0433\u043e \u043e\u043a\u043d\u0430, \u0442\u043e \u0440\u0430\u0437\u043c\u0435\u0440 \u044d\u0442\u043e\u0433\u043e \u043e\u043a\u043d\u0430 \u0441\u043a\u043e\u0440\u0435\u0435 \u0434\u043e\u043b\u0436\u0435\u043d \u0431\u044b\u0442\u044c \u0434\u043e\u0441\u0442\u0430\u0442\u043e\u0447\u043d\u043e \u0431\u043e\u043b\u044c\u0448\u0438\u043c, \u0447\u0442\u043e\u0431\u044b \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u043b\u044f\u0442\u044c \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b \u0441 \u0442\u0435\u043c \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b\u043e\u043c \u0432\u0440\u0435\u043c\u0435\u043d\u0438, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u043f\u043e\u0437\u0432\u043e\u043b\u0438\u0442 \u0437\u0430 \u044d\u0442\u0438\u043c \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u0430\u043c\u0438 \u0441\u043b\u0435\u0434\u0438\u0442\u044c \u0438 \u0440\u0435\u0430\u0433\u0438\u0440\u043e\u0432\u0430\u0442\u044c \u043d\u0430 \u043a\u0430\u043a\u0438\u0435-\u043b\u0438\u0431\u043e \u043f\u0440\u043e\u0431\u043b\u0435\u043c\u044b \u0441 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u043e\u043c \u0434\u0430\u043d\u043d\u044b\u0445. \u0422\u0430\u043a, \u0435\u0441\u043b\u0438 \"\u0432\u0440\u0435\u043c\u044f \u0440\u0435\u0430\u043a\u0446\u0438\u0438\" \u0432\u0430\u0448\u0435\u0439 \u0438\u043d\u0436\u0435\u043d\u0435\u0440\u043d\u043e\u0439 \u043a\u043e\u043c\u0430\u043d\u0434\u044b \u043f\u0440\u0438\u043c\u0435\u0440\u043d\u043e 1 \u0447\u0430\u0441, \u0442\u043e \u0438 \u043e\u043a\u043d\u043e \u0434\u043e\u043b\u0436\u043d\u043e \u0431\u044b\u0442\u044c \u043f\u0440\u0438\u043c\u0435\u0440\u043d\u043e \u0441 \u0442\u0430\u043a\u0438\u043c \u0436\u0435 \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b\u043e\u043c. \u041d\u0435\u0442 \u043e\u0441\u043e\u0431\u043e\u0433\u043e \u0441\u043c\u044b\u0441\u043b\u0430 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0442\u044c \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438 \u043d\u0430 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u043c \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u043c \u043a\u0430\u0436\u0434\u044b\u0435 10 \u043c\u0438\u043d\u0443\u0442, \u0435\u0441\u043b\u0438 \u0443 \u0432\u0430\u0441 \u043d\u0435\u0442 \u0440\u0435\u0441\u0443\u0440\u0441\u043e\u0432 \u043d\u0430 \u043d\u0438\u0445 \u0440\u0435\u0430\u0433\u0438\u0440\u043e\u0432\u0430\u0442\u044c.
"},{"location":"ru/03-job-configuration/","title":"Job Configuration","text":"tbd.
"},{"location":"ru/03-job-configuration/01-Connections/","title":"\u041a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0439 (Connections)","text":"\u0424\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a Checkita \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u0441\u043e\u0437\u0434\u0430\u0432\u0430\u0442\u044c \u0447\u0438\u0442\u0430\u0442\u044c \u0434\u0430\u043d\u043d\u044b\u0435 \u0438\u0437 \u0432\u043d\u0435\u0448\u043d\u0438\u0445 \u0441\u0438\u0441\u0442\u0435\u043c, \u0442\u0430\u043a\u0438\u0445 \u043a\u0430\u043a \u0440\u0435\u043b\u044f\u0446\u0438\u043e\u043d\u043d\u044b\u0435 \u0421\u0423\u0411\u0414 \u0438\u043b\u0438 \u0431\u0440\u043e\u043a\u0435\u0440\u044b \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u0439 (Kafka). \u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u0447\u0438\u0442\u0430\u0442\u044c \u0434\u0430\u043d\u043d\u044b\u0435 \u0438\u0437 \u0432\u043d\u0435\u0448\u043d\u0438\u0445 \u0441\u0438\u0441\u0442\u0435\u043c, \u043d\u0443\u0436\u043d\u043e \u043d\u0430\u0441\u0442\u0440\u043e\u0438\u0442\u044c \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0435 \u043a \u043d\u0438\u043c.
\u0422\u0430\u043a, \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0439 \u043a \u0432\u043d\u0435\u0448\u043d\u0438\u043c \u0441\u0438\u0441\u0442\u0435\u043c\u0430\u043c \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u0432 \u0440\u0430\u0437\u0434\u0435\u043b\u0435 connections
\u0444\u0430\u0439\u043b\u0430 \u0441 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430. \u041d\u0430 \u0442\u0435\u043a\u0443\u0449\u0438\u0439 \u043c\u043e\u043c\u0435\u043d\u0442 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u044e\u0442\u0441\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u043c \u0441\u0438\u0441\u0442\u0435\u043c\u0430\u043c:
\u0412\u0441\u0435 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u0434\u043e\u043b\u0436\u043d\u044b \u0438\u043c\u0435\u0442\u044c \u0443\u043d\u0438\u043a\u0430\u043b\u044c\u043d\u044b\u0439 \u0438\u0434\u0435\u043d\u0442\u0438\u0444\u0438\u043a\u0430\u0442\u043e\u0440 id
, \u0430 \u0442\u0430\u043a\u0436\u0435 \u043c\u043e\u0433\u0443\u0442 \u0438\u043c\u0435\u0442\u044c \u043e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a Spark-\u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432. \u042d\u0442\u0438 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u0432 \u043f\u043e\u043b\u0435 parameters
\u0438 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0442\u0441\u044f \u0434\u043b\u044f \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0442\u0441\u044f Spark'\u043e\u043c, \u0447\u0442\u043e\u0431\u044b \u0447\u0438\u0442\u0430\u0442\u044c \u0434\u0430\u043d\u043d\u044b\u0435 \u0438\u0437 \u0434\u0430\u043d\u043d\u043e\u0439 \u0441\u0438\u0441\u0442\u0435\u043c\u044b.
\u041a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u044f \u0432\u0441\u0435\u0445 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0439 \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u043c\u0438 \u043e\u0431\u0449\u0438\u043c\u0438 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u0430\u043c\u0438:
id
- \u0418\u0434\u0435\u043d\u0442\u0438\u0444\u0438\u043a\u0430\u0442\u043e\u0440 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e\u0435 \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f;parameters
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a Spark-\u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0443\u043a\u0430\u0437\u0430\u043d\u044b, \u0447\u0442\u043e\u0431\u044b \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0438\u0442\u044c \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u0443\u044e \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u044e \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a\u0430\u043a \u044d\u0442\u043e\u0433\u043e \u0442\u0440\u0435\u0431\u0443\u0435\u0442 Spark.metadata
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0445 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c.\u0421\u043f\u0435\u0446\u0438\u0444\u0438\u0447\u043d\u044b\u0435 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b \u0443\u043a\u0430\u0437\u0430\u043d\u044b \u043d\u0438\u0436\u0435 \u043e\u0442\u0434\u0435\u043b\u044c\u043d\u043e \u0434\u043b\u044f \u043a\u0430\u0436\u0434\u043e\u0433\u043e \u0438\u0437 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u043c\u044b\u0445 \u0442\u0438\u043f\u043e\u0432 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0439.
\u041f\u0440\u0438\u043c\u0435\u0440 \u0437\u0430\u043f\u043e\u043b\u043d\u0435\u043d\u043d\u043e\u0433\u043e \u0440\u0430\u0437\u0434\u0435\u043b\u0430 connections
\u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d \u043d\u0438\u0436\u0435 \u0432 \u0433\u043b\u0430\u0432\u0435 \u041f\u0440\u0438\u043c\u0435\u0440 \u041e\u043f\u0438\u0441\u0430\u043d\u0438\u044f \u041f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0439.
\u041d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0411\u0414 SQLite \u0434\u043e\u0441\u0442\u0430\u0442\u043e\u0447\u043d\u043e \u043f\u0440\u043e\u0441\u0442\u0430. \u0414\u043e\u0441\u0442\u0430\u0442\u043e\u0447\u043d\u043e \u0443\u043a\u0430\u0437\u0430\u0442\u044c \u0442\u043e\u043b\u044c\u043a\u043e \u0434\u0432\u0430 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u0430:
id
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. ID \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f;url
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u041f\u0443\u0442\u044c \u0434\u043e \u0444\u0430\u0439\u043b\u0430 \u0441 \u0431\u0430\u0437\u043e\u0439 SQLite.parameters
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0421\u043f\u0438\u0441\u043e\u043a Spark-\u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f), \u0433\u0434\u0435 \u043a\u0430\u0436\u0434\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440 - \u044d\u0442\u043e \u0441\u0442\u0440\u043e\u043a\u0430 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: spark.param.name=spark.param.value
.metadata
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0434\u043b\u044f \u044d\u0442\u043e\u0433\u043e \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: param.name=param.value
.\u041d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a PostgreSQL \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u043e\u043f\u0438\u0441\u0430\u043d\u0430 \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0445 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432:
id
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. ID \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f;url
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. URL \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0441\u0435\u0440\u0432\u0435\u0440\u0443 PostgreSQL. URL \u0434\u043e\u043b\u0436\u0435\u043d \u0441\u043e\u0434\u0435\u0440\u0436\u0430\u0442\u044c \u0430\u0434\u0440\u0435\u0441 \u0441\u0435\u0440\u0432\u0435\u0440\u0430, \u043f\u043e\u0440\u0442 \u0438 \u0438\u043c\u044f \u0411\u0414 \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f. \u0414\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0435 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b \u0442\u0430\u043a\u0436\u0435 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0443\u043a\u0430\u0437\u0430\u043d\u044b \u0432 URL, \u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f, \u0432 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0438\u0438 \u0441 \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u0430\u0446\u0438\u0435\u0439 PostgreSQL. URL \u043d\u0435 \u0434\u043e\u043b\u0436\u0435\u043d \u0441\u043e\u0434\u0435\u0440\u0436\u0430\u0442\u044c \u043f\u0440\u043e\u0442\u043e\u043a\u043e\u043b \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f.username
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0418\u043c\u044f \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044f \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0431\u0430\u0437\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f).password
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041f\u0430\u0440\u043e\u043b\u044c \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0431\u0430\u0437\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f).parameters
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0421\u043f\u0438\u0441\u043e\u043a Spark-\u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f), \u0433\u0434\u0435 \u043a\u0430\u0436\u0434\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440 - \u044d\u0442\u043e \u0441\u0442\u0440\u043e\u043a\u0430 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: spark.param.name=spark.param.value
.metadata
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0434\u043b\u044f \u044d\u0442\u043e\u0433\u043e \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: param.name=param.value
.\u041d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a Oracle \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u0442\u0430\u043a \u0436\u0435 \u043a\u0430\u043a \u0438 \u0434\u043b\u044f PostgreSQL, \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0445 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432:
id
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. ID \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f;url
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. URL \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0441\u0435\u0440\u0432\u0435\u0440\u0443 Oracle. URL \u0434\u043e\u043b\u0436\u0435\u043d \u0441\u043e\u0434\u0435\u0440\u0436\u0430\u0442\u044c \u0430\u0434\u0440\u0435\u0441 \u0441\u0435\u0440\u0432\u0435\u0440\u0430, \u043f\u043e\u0440\u0442 \u0438 \u0438\u043c\u044f \u0411\u0414 \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f. \u0414\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0435 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b \u0442\u0430\u043a\u0436\u0435 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0443\u043a\u0430\u0437\u0430\u043d\u044b \u0432 URL, \u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f, \u0432 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0438\u0438 \u0441 \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u0430\u0446\u0438\u0435\u0439 Oracle. URL \u043d\u0435 \u0434\u043e\u043b\u0436\u0435\u043d \u0441\u043e\u0434\u0435\u0440\u0436\u0430\u0442\u044c \u043f\u0440\u043e\u0442\u043e\u043a\u043e\u043b \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f.username
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0418\u043c\u044f \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044f \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0431\u0430\u0437\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f).password
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041f\u0430\u0440\u043e\u043b\u044c \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0431\u0430\u0437\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f).parameters
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0421\u043f\u0438\u0441\u043e\u043a Spark-\u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f), \u0433\u0434\u0435 \u043a\u0430\u0436\u0434\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440 - \u044d\u0442\u043e \u0441\u0442\u0440\u043e\u043a\u0430 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: spark.param.name=spark.param.value
.metadata
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0434\u043b\u044f \u044d\u0442\u043e\u0433\u043e \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: param.name=param.value
.\u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u043d\u0430\u0441\u0442\u0440\u043e\u0438\u0442\u044c \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0435 \u043a \u043a\u043b\u0430\u0441\u0442\u0435\u0440\u0443 Kafka, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0443\u043a\u0430\u0437\u0430\u0442\u044c \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0435 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b:
id
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. ID \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f;servers
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0421\u043f\u0438\u0441\u043e\u043a \u0441\u0435\u0440\u0432\u0435\u0440\u043e\u0432 (\u0431\u0440\u043e\u043a\u0435\u0440\u043e\u0432 \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u0439) \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f.parameters
- Optional. \u0421\u043f\u0438\u0441\u043e\u043a Spark-\u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f), \u0433\u0434\u0435 \u043a\u0430\u0436\u0434\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440 - \u044d\u0442\u043e \u0441\u0442\u0440\u043e\u043a\u0430 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: spark.param.name=spark.param.value
. \u041e\u0431\u044b\u0447\u043d\u043e, \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0438 \u0430\u0432\u0442\u043e\u0440\u0438\u0437\u0430\u0446\u0438\u0438 \u0432 Kafka \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u043b\u044f\u044e\u0442\u0441\u044f \u043a\u0430\u043a \u0441\u043f\u0438\u0441\u043e\u043a Spark-\u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432.metadata
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0434\u043b\u044f \u044d\u0442\u043e\u0433\u043e \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: param.name=param.value
.\u0415\u0441\u043b\u0438 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0435 \u043a \u043a\u043b\u0430\u0441\u0442\u0435\u0440\u0443 Kafka \u0442\u0440\u0435\u0431\u0443\u0435\u0442 \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u043e\u043d\u043d\u043e\u0433\u043e \u0444\u0430\u0439\u043b\u0430 JAAS, \u0442\u043e \u0435\u0433\u043e \u0440\u0430\u0441\u043f\u043e\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u0434\u043e\u043b\u0436\u043d\u043e \u0431\u044b\u0442\u044c \u0443\u043a\u0430\u0437\u0430\u043d\u043e \u0432 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u043e\u043a\u0440\u0443\u0436\u0435\u043d\u0438\u044f Java. \u0412\u0430\u0436\u043d\u043e \u0437\u0430\u043c\u0435\u0442\u0438\u0442\u044c, \u0447\u0442\u043e \u044d\u0442\u0438 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u0443\u0441\u0442\u0430\u043d\u043e\u0432\u043b\u0435\u043d\u044b \u0434\u043e \u0442\u043e\u0433\u043e, \u043a\u0430\u043a JVM \u0431\u0443\u0434\u0435\u0442 \u0437\u0430\u043f\u0443\u0449\u0435\u043d\u0430. \u041f\u043e\u044d\u0442\u043e\u043c\u0443 \u043e\u043d\u0438 \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u044b \u0432 \u043a\u043e\u043c\u0430\u043d\u0434\u0435 spark-submit
\u043a\u0430\u043a \u0443\u043a\u0430\u0437\u0430\u043d\u043e \u043d\u0438\u0436\u0435:
cluster
\u0440\u0435\u0436\u0438\u043c\u0435: --deploy-mode cluster \\\n--conf 'spark.driver.extraJavaOptions=\"-Djava.security.auth.login.config=./jaas.conf\"' \\\n--conf 'spark.executor.extraJavaOptions=\"-Djava.security.auth.login.config=./jaas.conf\"' \\\n--files /path/to/your/jaas.conf,<other files required for DQ>\n
client
\u0440\u0435\u0436\u0438\u043c\u0435, \u0442\u043e JVM \u043d\u0430 \u043a\u043b\u0438\u0435\u043d\u0442\u0435 (\u0434\u0440\u0430\u0439\u0432\u0435\u0440) \u0441\u0442\u0430\u0440\u0442\u0443\u0435\u0442 \u0434\u043e \u0442\u043e\u0433\u043e, \u043a\u0430\u043a \u0441\u0447\u0438\u0442\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 Spark-\u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f. \u041f\u043e\u044d\u0442\u043e\u043c\u0443, \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u043e\u043a\u0440\u0443\u0436\u0435\u043d\u0438\u044f Java \u0434\u043b\u044f \u0434\u0440\u0430\u0439\u0432\u0435\u0440\u0430 \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u0437\u0430\u0434\u0430\u043d\u044b \u043f\u043e\u0441\u0440\u0435\u0434\u0441\u0442\u0432\u043e\u043c \u0430\u0440\u0433\u0443\u043c\u0435\u043d\u0442\u0430 --driver-java-options
: --deploy-mode client \\\n--driver-java-options \"-Djava.security.auth.login.config=.jaas.conf\" \\\n--conf 'spark.executor.extraJavaOptions=\"-Djava.security.auth.login.config=./jaas.conf\"' \\\n--files file.keytab,jaas.conf,<other files required for DQ>\n
\u041a\u0430\u043a \u043f\u043e\u043a\u0430\u0437\u0430\u043d\u043e \u0432 \u043f\u0440\u0438\u043c\u0435\u0440\u0435 \u043d\u0438\u0436\u0435, \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043e\u0434\u043d\u043e\u0433\u043e \u0442\u0438\u043f\u0430 \u0441\u0433\u0440\u0443\u043f\u043f\u0438\u0440\u043e\u0432\u0430\u043d\u044b \u0432 \u043f\u043e\u0434\u0440\u0430\u0437\u0434\u0435\u043b\u044b, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0438\u043c\u0435\u043d\u0443\u044e\u0442\u0441\u044f \u0432 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0438\u0438 \u0441 \u0442\u0438\u043f\u043e\u043c \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f. \u042d\u0442\u0438 \u0440\u0430\u0437\u0434\u0435\u043b\u044b \u0441\u043e\u0434\u0435\u0440\u0436\u0430\u0442 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0439 \u0442\u043e\u043b\u044c\u043a\u043e \u0434\u0430\u043d\u043d\u043e\u0433\u043e \u0442\u0438\u043f\u0430.
jobConfig: {\n connections: {\n postgres: [\n {id: \"postgre_db1\", url: \"postgre1.db.com:5432/public\", username: \"dq-user\", password: \"dq-password\"}\n {\n id: \"postgre_db2\",\n url: \"postgre2.db.com:5432/public\",\n username: \"dq-user\",\n password: \"dq-password\",\n schema: \"dataquality\"\n }\n ]\n oracle: [\n {id: \"oracle_db1\", url: \"oracle.db.com:1521/public\", username: \"db-user\", password: \"dq-password\"}\n ]\n sqlite: [\n {id: \"sqlite_db\", url: \"some/path/to/db.sqlite\"}\n ],\n kafka: [\n {id: \"kafka_cluster_1\", servers: [\"server1:9092\", \"server2:9092\"]}\n {\n id: \"kafka_cluster_2\",\n servers: [\"kafka-broker1:9092\", \"kafka-broker2:9092\", \"kafka-broker3:9092\"]\n parameters: [\n \"security.protocol=SASL_PLAINTEXT\",\n \"sasl.mechanism=GSSAPI\",\n \"sasl.kerberos.service.name=kafka-service\"\n ]\n }\n ]\n }\n}\n
"},{"location":"ru/03-job-configuration/02-Schemas/","title":"\u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0421\u0445\u0435\u043c \u0414\u0430\u043d\u043d\u044b\u0445 (Schemas)","text":"\u0421\u0445\u0435\u043c\u044b \u0434\u0430\u043d\u043d\u044b\u0445 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0442\u0441\u044f \u0432 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430\u0445 \u0434\u043b\u044f \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0445 \u0446\u0435\u043b\u0435\u0439:
schemaMatch
(\u0441\u043c. \u0433\u043b\u0430\u0432\u0443 Schema Match Check)\u0421\u0445\u0435\u043c\u0430 \u0434\u0430\u043d\u043d\u044b\u0445 \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u0432 \u0440\u0430\u0437\u0434\u0435\u043b\u0435 schemas
\u0444\u0430\u0439\u043b\u0430 \u0441 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0435\u0439 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430. \u0421\u0445\u0435\u043c\u044b \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u043e\u043f\u0438\u0441\u0430\u043d\u044b \u0432 \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0445 \u0444\u043e\u0440\u043c\u0430\u0442\u0430\u0445. \u0424\u043e\u0440\u043c\u0430\u0442, \u0432 \u043a\u043e\u0442\u043e\u0440\u043e\u043c \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u0441\u0445\u0435\u043c\u0430, \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u0432 \u043f\u043e\u043b\u0435 kind
, \u0438 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u0442 \u0434\u0440\u0443\u0433\u0438\u0435 \u043f\u043e\u043b\u044f, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u0437\u0430\u043f\u043e\u043b\u043d\u0435\u043d\u044b.
\u041f\u043e\u043c\u0438\u043c\u043e \u043f\u043e\u043b\u044f kind
, \u0441\u0445\u0435\u043c\u044b \u0432\u0441\u0435\u0445 \u0444\u043e\u0440\u043c\u0430\u0442\u043e\u0432 \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0445 \u043e\u0431\u0449\u0438\u0445 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432:
id
- \u0418\u0434\u0435\u043d\u0442\u0438\u0444\u0438\u043a\u0430\u0442\u043e\u0440 \u0441\u0445\u0435\u043c\u044b;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e\u0435 \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0441\u0445\u0435\u043c\u044b;metadata
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0445 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c.\u0414\u0430\u043d\u043d\u044b\u0439 \u0442\u0438\u043f \u0441\u0445\u0435\u043c \u0432 \u043e\u0441\u043d\u043e\u0432\u043d\u043e\u043c \u043f\u0440\u0435\u0434\u043d\u0430\u0437\u043d\u0430\u0447\u0435\u043d \u0434\u043b\u044f \u0447\u0442\u0435\u043d\u0438\u044f \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u044b\u0445 \u0444\u0430\u0439\u043b\u043e\u0432 \u0441 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u0435\u043c, \u0442\u0430\u043a\u0438\u0445 CSV \u0438\u043b\u0438 TSV. \u0422\u0435\u043c \u043d\u0435 \u043c\u0435\u043d\u0435\u0435 \u044d\u0442\u0438 \u0441\u0445\u0435\u043c\u044b \u0442\u0430\u043a\u0436\u0435 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u044b \u0432 \u0437\u0430\u0433\u0440\u0443\u0437\u043e\u0447\u043d\u044b\u0445 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0430\u0445 schemaMatch
. \u0414\u0430\u043d\u043d\u044b\u0445 \u0442\u0438\u043f \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 \u0441\u0445\u0435\u043c\u044b \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u0442\u044c \u0442\u043e\u043b\u044c\u043a\u043e \u043f\u043b\u043e\u0441\u043a\u0438\u0435 \u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u044b (\u0432\u043b\u043e\u0436\u0435\u043d\u043d\u044b\u0435 \u043a\u043e\u043b\u043e\u043d\u043a\u0438 \u043d\u0435 \u0434\u043e\u043f\u0443\u0441\u043a\u0430\u044e\u0442\u0441\u044f).
\u0418\u0442\u0430\u043a, \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0441\u0445\u0435\u043c\u044b \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u044b\u0445 \u0444\u0430\u0439\u043b\u043e\u0432 \u0441 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u0435\u043c \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u0434\u0438\u0442\u0441\u044f \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0445 \u043f\u043e\u043b\u0435\u0439:
kind: \"delimited\"
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0423\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442 \u0444\u043e\u0440\u043c\u0430\u0442 \u0441\u0445\u0435\u043c\u044b \u0434\u043b\u044f \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u044b\u0445 \u0444\u0430\u0439\u043b\u043e\u0432 \u0441 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u0435\u043c;id
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. ID \u0441\u0445\u0435\u043c\u044b;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0441\u0445\u0435\u043c\u044b;schema
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0421\u043f\u0438\u0441\u043e\u043a \u043a\u043e\u043b\u043e\u043d\u043e\u043a, \u0433\u0434\u0435 \u043a\u0430\u0436\u0434\u0430\u044f \u043a\u043e\u043b\u043e\u043d\u043a\u0430 - \u044d\u0442\u043e \u043e\u0431\u044a\u0435\u043a\u0442 \u0441\u043e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u043c \u043f\u043e\u043b\u044f\u043c\u0438:name
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0418\u043c\u044f \u043a\u043e\u043b\u043e\u043d\u043a\u0438;type
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0422\u0438\u043f \u043a\u043e\u043b\u043e\u043d\u043a\u0438. \u0421\u043f\u0438\u0441\u043e\u043a \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u043c\u044b\u0445 \u0442\u0438\u043f\u043e\u0432 \u0434\u0430\u043d \u0432 \u0433\u043b\u0430\u0432\u0435 \u041f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u043c\u044b\u0435 \u041d\u0430\u0438\u043c\u0435\u043d\u043e\u0432\u0430\u043d\u0438\u044f \u0422\u0438\u043f\u043e\u0432.metadata
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0434\u043b\u044f \u044d\u0442\u043e\u0439 \u0441\u0445\u0435\u043c\u044b \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: param.name=param.value
.\u0414\u0430\u043d\u043d\u044b\u0439 \u0442\u0438\u043f \u0441\u0445\u0435\u043c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u0442\u043e\u0433\u043e, \u0447\u0442\u043e\u0431\u044b \u0447\u0438\u0442\u0430\u0442\u044c \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u044b\u0435 \u0444\u0430\u0439\u043b\u044b \u0431\u0435\u0437 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u044f: \u0441 \u0444\u0438\u043a\u0441\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u043e\u0439 \u0448\u0438\u0440\u0438\u043d\u043e\u0439 \u043a\u043e\u043b\u043e\u043d\u043a\u0438. \u041e\u0441\u043d\u043e\u0432\u043d\u043e\u0435 \u043e\u0442\u043b\u0438\u0447\u0438\u0435 \u043e\u0442 \u0441\u0445\u0435\u043c \u0434\u0440\u0443\u0433\u0438\u0445 \u0442\u0438\u043f\u043e\u0432 - \u044d\u0442\u043e \u0443\u043a\u0430\u0437\u0430\u043d\u0438\u0435 \u0448\u0438\u0440\u0438\u043d\u044b \u043a\u0430\u0436\u0434\u043e\u0439 \u043a\u043e\u043b\u043e\u043d\u043a\u0438 (\u0447\u0438\u0441\u043b\u043e \u0441\u0438\u043c\u0432\u043e\u043b\u043e\u0432), \u043a\u043e\u0442\u043e\u0440\u043e\u0435 \u043a\u0440\u0438\u0442\u0438\u0447\u043d\u043e \u0434\u043b\u044f \u0442\u043e\u0433\u043e, \u0447\u0442\u043e\u0431\u044b \u043f\u0440\u0430\u0432\u0438\u043b\u044c\u043d\u043e \u0441\u0447\u0438\u0442\u044b\u0432\u0430\u0442\u044c \u0441\u043e\u0434\u0435\u0440\u0436\u0438\u043c\u043e\u0435 \u0442\u0430\u043a\u0438\u0445 \u0444\u0430\u0439\u043b\u043e\u0432. \u041d\u0435\u0441\u043c\u043e\u0442\u0440\u044f \u043d\u0430 \u0441\u043f\u0435\u0446\u0438\u0444\u0438\u043a\u0443, \u044d\u0442\u043e\u0442 \u0442\u0438\u043f \u0441\u0445\u0435\u043c \u0442\u0430\u043a\u0436\u0435 \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d \u0434\u043b\u044f \u0447\u0442\u0435\u043d\u0438\u044f \u0444\u0430\u0439\u043b\u043e\u0432 \u0441 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u0435\u043c \u0438 \u0434\u043b\u044f \u0443\u043a\u0430\u0437\u0430\u043d\u0438\u044f \u0432 \u0437\u0430\u0433\u0440\u0443\u0437\u043e\u0447\u043d\u044b\u0445 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0430\u0445 schemaMatch
. \u0414\u0430\u043d\u043d\u044b\u0445 \u0442\u0438\u043f \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 \u0441\u0445\u0435\u043c\u044b \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u0442\u044c \u0442\u043e\u043b\u044c\u043a\u043e \u043f\u043b\u043e\u0441\u043a\u0438\u0435 \u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u044b (\u0432\u043b\u043e\u0436\u0435\u043d\u043d\u044b\u0435 \u043a\u043e\u043b\u043e\u043d\u043a\u0438 \u043d\u0435 \u0434\u043e\u043f\u0443\u0441\u043a\u0430\u044e\u0442\u0441\u044f).
\u0418\u0442\u0430\u043a, \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0441\u0445\u0435\u043c\u044b \u0444\u0430\u0439\u043b\u043e\u0432 \u0441 \u0444\u0438\u043a\u0441\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u043e\u0439 \u0448\u0438\u0440\u0438\u043d\u043e\u0439 \u043a\u043e\u043b\u043e\u043d\u043a\u0438 \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u0434\u0438\u0442\u0441\u044f \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0445 \u043f\u043e\u043b\u0435\u0439:
kind: \"fixedFull\"
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0423\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442 \u0444\u043e\u0440\u043c\u0430\u0442 \u0441\u0445\u0435\u043c\u044b \u0434\u043b\u044f \u0442\u0435\u0441\u0442\u043e\u0432\u044b\u0445 \u0444\u0430\u0439\u043b\u043e\u0432 \u0431\u0435\u0437 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u044f.id
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. ID \u0441\u0445\u0435\u043c\u044b;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0441\u0445\u0435\u043c\u044b;schema
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0421\u043f\u0438\u0441\u043e\u043a \u043a\u043e\u043b\u043e\u043d\u043e\u043a, \u0433\u0434\u0435 \u043a\u0430\u0436\u0434\u0430\u044f \u043a\u043e\u043b\u043e\u043d\u043a\u0430 - \u044d\u0442\u043e \u043e\u0431\u044a\u0435\u043a\u0442 \u0441\u043e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u043c \u043f\u043e\u043b\u044f\u043c\u0438:name
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0418\u043c\u044f \u043a\u043e\u043b\u043e\u043d\u043a\u0438;type
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0422\u0438\u043f \u043a\u043e\u043b\u043e\u043d\u043a\u0438. \u0421\u043f\u0438\u0441\u043e\u043a \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u043c\u044b\u0445 \u0442\u0438\u043f\u043e\u0432 \u0434\u0430\u043d \u0432 \u0433\u043b\u0430\u0432\u0435 \u041f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u043c\u044b\u0435 \u041d\u0430\u0438\u043c\u0435\u043d\u043e\u0432\u0430\u043d\u0438\u044f \u0422\u0438\u043f\u043e\u0432.width
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0428\u0438\u0440\u0438\u043d\u0430 \u043a\u043e\u043b\u043e\u043d\u043a\u0438 (\u0446\u0435\u043b\u043e\u0435 \u0447\u0438\u0441\u043b\u043e \u0441\u0438\u043c\u0432\u043e\u043b\u043e\u0432).metadata
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0434\u043b\u044f \u044d\u0442\u043e\u0439 \u0441\u0445\u0435\u043c\u044b \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: param.name=param.value
.\u0414\u0430\u043d\u043d\u044b\u0435 \u0442\u0438\u043f \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u043b\u044f\u0435\u0442 \u0431\u043e\u043b\u0435\u0435 \u043f\u0440\u043e\u0441\u0442\u043e\u0439 \u0441\u043f\u043e\u0441\u043e\u0431 \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u0442\u044c \u0441\u0445\u0435\u043c\u044b \u0434\u043b\u044f \u0444\u0430\u0439\u043b\u043e\u0432 \u0431\u0435\u0437 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u044f. \u0412 \u0434\u0430\u043d\u043d\u043e\u043c \u0441\u043b\u0443\u0447\u0430\u0435 \u043a\u043e\u043b\u043e\u043d\u043a\u0438 \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u0441 \u0443\u043a\u0430\u0437\u0430\u043d\u0438\u0435\u043c \u0442\u043e\u043b\u044c\u043a\u043e \u043b\u0438\u0448\u044c \u0438\u0445 \u0438\u043c\u0435\u043d\u0438 \u0438 \u0448\u0438\u0440\u0438\u043d\u044b. \u0421\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0435\u043d\u043d\u043e, \u0432\u0441\u0435 \u043a\u043e\u043b\u043e\u043d\u043a\u0438 \u0431\u0443\u0434\u0443\u0442 \u0438\u043c\u0435\u0442\u044c \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u044b\u0439 \u0442\u0438\u043f \u0434\u0430\u043d\u043d\u044b\u0445. \u0414\u0430\u043d\u043d\u044b\u0439 \u0442\u0438\u043f \u0441\u0445\u0435\u043c \u0442\u0430\u043a\u0436\u0435 \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d \u0434\u043b\u044f \u0447\u0442\u0435\u043d\u0438\u044f \u0444\u0430\u0439\u043b\u043e\u0432 \u0441 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u0435\u043c \u0438 \u0434\u043b\u044f \u0443\u043a\u0430\u0437\u0430\u043d\u0438\u044f \u0432 \u0437\u0430\u0433\u0440\u0443\u0437\u043e\u0447\u043d\u044b\u0445 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0430\u0445 schemaMatch
. \u0414\u0430\u043d\u043d\u044b\u0445 \u0442\u0438\u043f \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 \u0441\u0445\u0435\u043c\u044b \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u0442\u044c \u0442\u043e\u043b\u044c\u043a\u043e \u043f\u043b\u043e\u0441\u043a\u0438\u0435 \u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u044b (\u0432\u043b\u043e\u0436\u0435\u043d\u043d\u044b\u0435 \u043a\u043e\u043b\u043e\u043d\u043a\u0438 \u043d\u0435 \u0434\u043e\u043f\u0443\u0441\u043a\u0430\u044e\u0442\u0441\u044f).
\u0418\u0442\u0430\u043a, \u0443\u043f\u0440\u043e\u0449\u0435\u043d\u043d\u043e\u0435 \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0441\u0445\u0435\u043c\u044b \u0444\u0430\u0439\u043b\u043e\u0432 \u0431\u0435\u0437 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u044f \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u0434\u0438\u0442\u0441\u044f \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0445 \u043f\u043e\u043b\u0435\u0439:
kind: \"fixedShort\"
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0423\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442 \u0443\u043f\u0440\u043e\u0449\u0435\u043d\u043d\u044b\u0439 \u0444\u043e\u0440\u043c\u0430\u0442 \u0441\u0445\u0435\u043c\u044b \u0434\u043b\u044f \u0442\u0435\u0441\u0442\u043e\u0432\u044b\u0445 \u0444\u0430\u0439\u043b\u043e\u0432 \u0431\u0435\u0437 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u044f.id
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. ID \u0441\u0445\u0435\u043c\u044b;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0441\u0445\u0435\u043c\u044b;schema
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0421\u043f\u0438\u0441\u043e\u043a \u043a\u043e\u043b\u043e\u043d\u043e\u043a, \u0433\u0434\u0435 \u043a\u0430\u0436\u0434\u0430\u044f \u043a\u043e\u043b\u043e\u043d\u043a\u0430 \u044d\u0442\u043e \u0441\u0442\u0440\u043e\u043a\u0430 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: columnName:columnWidth
. \u041a\u043e\u043b\u043e\u043d\u043a\u0438 \u0432\u0441\u0435\u0433\u0434\u0430 \u0438\u043c\u0435\u044e\u0442 \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u044b\u0439 \u0442\u0438\u043f.metadata
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0434\u043b\u044f \u044d\u0442\u043e\u0439 \u0441\u0445\u0435\u043c\u044b \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: param.name=param.value
.\u0414\u0430\u043d\u043d\u044b\u0439 \u0442\u0438\u043f \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u0442\u043e\u0433\u043e, \u0447\u0442\u043e\u0431\u044b \u0447\u0438\u0442\u0430\u0442\u044c Avro \u0441\u0445\u0435\u043c\u044b \u0438\u0437 \u0444\u0430\u0439\u043b\u043e\u0432, \u0440\u0430\u0441\u0448\u0438\u0440\u0435\u043d\u0438\u0435\u043c .avcs
. \u0422\u0430\u043a, \u0441\u0445\u0435\u043c\u0430, \u0441\u0447\u0438\u0442\u0430\u043d\u043d\u0430\u044f \u0438\u0437 \u0442\u0430\u043a\u043e\u0433\u043e \u0444\u0430\u0439\u043b\u0430, \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0430 \u043a\u0430\u043a \u0434\u043b\u044f \u0447\u0442\u0435\u043d\u0438\u044f Avro-\u0444\u0430\u0439\u043b\u043e\u0432, \u0442\u0430\u043a \u0438 \u0434\u043b\u044f \u0447\u0442\u0435\u043d\u0438\u044f \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u044b\u0445 \u0444\u0430\u0439\u043b\u043e\u0432 \u0441 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u0435\u043c. \u0422\u0430\u043a\u0436\u0435, \u044d\u0442\u0438 \u0441\u0445\u0435\u043c\u044b \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u044b \u0432 \u0437\u0430\u0433\u0440\u0443\u0437\u043e\u0447\u043d\u044b\u0445 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0430\u0445 schemaMatch
. \u0412 \u0434\u043e\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u0435 \u0441\u0442\u043e\u0438\u0442 \u043e\u0442\u043c\u0435\u0442\u0438\u0442\u044c, \u0447\u0442\u043e Avro \u0441\u0445\u0435\u043c\u044b \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u044e\u0442 \u0441\u043b\u043e\u0436\u043d\u044b\u0435 \u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u044b \u0441\u043e \u0432\u043b\u043e\u0436\u0435\u043d\u043d\u044b\u043c\u0438 \u043a\u043e\u043b\u043e\u043d\u043a\u0430\u043c\u0438.
\u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u043f\u0440\u043e\u0447\u0438\u0442\u0430\u0442\u044c Avro \u0441\u0445\u0435\u043c\u0443 \u0438\u0437 \u0444\u0430\u0439\u043b\u0430, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0443\u043a\u0430\u0437\u0430\u0442\u044c \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0435 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b:
kind: \"avro\"
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0423\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442 \u0444\u043e\u0440\u043c\u0430\u0442 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 Avro \u0441\u0445\u0435\u043c\u044b.id
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. ID \u0441\u0445\u0435\u043c\u044b;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0441\u0445\u0435\u043c\u044b;schema
- Required. \u041f\u0443\u0442\u044c \u0434\u043e .avsc
\u0444\u0430\u0439\u043b\u0430 \u0438\u0437 \u043a\u043e\u0442\u043e\u0440\u043e\u0433\u043e \u0431\u0443\u0434\u0435\u0442 \u0441\u0447\u0438\u0442\u0430\u043d\u0430 Avro-\u0441\u0445\u0435\u043c\u0430.metadata
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0434\u043b\u044f \u044d\u0442\u043e\u0439 \u0441\u0445\u0435\u043c\u044b \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: param.name=param.value
.\u041a\u0430\u0442\u0430\u043b\u043e\u0433 Hive \u0442\u0430\u043a\u0436\u0435 \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d \u043a\u0430\u043a \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a \u0441\u0445\u0435\u043c \u0434\u0430\u043d\u043d\u044b\u0445. \u0422\u0430\u043a, Hive \u0444\u043e\u0440\u043c\u0430\u0442 \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u044f \u0441\u0445\u0435\u043c \u043f\u0440\u0435\u0434\u043d\u0430\u0437\u043d\u0430\u0447\u0435\u043d \u0434\u043b\u044f \u0442\u043e\u0433\u043e, \u0447\u0442\u043e\u0431\u044b \u043f\u043e\u043b\u0443\u0447\u0430\u0442\u044c \u0441\u0445\u0435\u043c\u044b \u0434\u0430\u043d\u043d\u044b\u0445, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0443\u044e\u0442 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u043d\u044b\u043c Hive \u0442\u0430\u0431\u043b\u0438\u0446\u0430\u043c. \u042d\u0442\u0438 \u0441\u0445\u0435\u043c\u044b \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u044b \u043a\u0430\u043a \u0434\u043b\u044f \u0447\u0442\u0435\u043d\u0438\u044f Avro-\u0444\u0430\u0439\u043b\u043e\u0432, \u0442\u0430\u043a \u0438 \u0434\u043b\u044f \u0447\u0442\u0435\u043d\u0438\u044f \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u044b\u0445 \u0444\u0430\u0439\u043b\u043e\u0432 \u0441 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u0435\u043c. \u0422\u0430\u043a\u0436\u0435, \u044d\u0442\u0438 \u0441\u0445\u0435\u043c\u044b \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u044b \u0432 \u0437\u0430\u0433\u0440\u0443\u0437\u043e\u0447\u043d\u044b\u0445 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0430\u0445 schemaMatch
. \u0412 \u0434\u043e\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u0435 \u0441\u0442\u043e\u0438\u0442 \u043e\u0442\u043c\u0435\u0442\u0438\u0442\u044c, \u0447\u0442\u043e Avro \u0441\u0445\u0435\u043c\u044b \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u044e\u0442 \u0441\u043b\u043e\u0436\u043d\u044b\u0435 \u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u044b \u0441\u043e \u0432\u043b\u043e\u0436\u0435\u043d\u043d\u044b\u043c\u0438 \u043a\u043e\u043b\u043e\u043d\u043a\u0430\u043c\u0438.
\u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u043f\u043e\u043b\u0443\u0447\u0438\u0442\u044c \u0441\u0445\u0435\u043c\u0443 \u0438\u0437 Hive \u043a\u0430\u0442\u0430\u043b\u043e\u0433\u0430, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0443\u043a\u0430\u0437\u0430\u0442\u044c \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0435 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b:
kind: \"hive\"
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0423\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442 Hive \u0444\u043e\u0440\u043c\u0430\u0442 \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u044f \u0441\u0445\u0435\u043c\u044b.id
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. ID \u0441\u0445\u0435\u043c\u044b;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0441\u0445\u0435\u043c\u044b;schema
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. Hive \u0441\u0445\u0435\u043c\u0430, \u0432 \u043a\u043e\u0442\u043e\u0440\u043e\u0439 \u043d\u0430\u0445\u043e\u0434\u0438\u0442\u0441\u044f \u0446\u0435\u043b\u0435\u0432\u0430\u044f Hive \u0442\u0430\u0431\u043b\u0438\u0446\u0430.table
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. Hive \u0442\u0430\u0431\u043b\u0438\u0446\u0430 \u0438\u0437 \u043a\u043e\u0442\u043e\u0440\u043e\u0439 \u0441\u0447\u0438\u0442\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u0441\u0445\u0435\u043c\u0430.excludeColumns
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0421\u043f\u0438\u0441\u043e\u043a \u043a\u043e\u043b\u043e\u043d\u043e\u043a, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0431\u0443\u0434\u0443\u0442 \u0438\u0441\u043a\u043b\u044e\u0447\u0435\u043d\u044b \u0438\u0437 \u0441\u0445\u0435\u043c\u044b. \u041d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u0432 \u043d\u0435\u043a\u043e\u0442\u043e\u0440\u044b\u0445 \u0441\u0438\u0442\u0443\u0430\u0446\u0438\u044f\u0445 \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0438\u0441\u043a\u043b\u044e\u0447\u0438\u0442\u044c \u043a\u043e\u043b\u043e\u043d\u043a\u0438 \u043f\u0430\u0440\u0442\u0438\u0446\u0438\u043e\u043d\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f \u0438\u0437 \u0441\u0445\u0435\u043c\u044b.\u0421\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0435 \u043d\u0430\u0438\u043c\u0435\u043d\u043e\u0432\u0430\u043d\u0438\u044f \u0442\u0438\u043f\u043e\u0432 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u044b \u043f\u0440\u0438 \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u0438 \u0441\u0445\u0435\u043c \u0434\u0430\u043d\u043d\u044b\u0445:
string
boolean
date
timestamp
integer (32-bit integer)
long (64-bit integer)
short (16-bit integer)
byte (signed integer in a single byte)
double
float
decimal(precision, scale)
(precision <= 38; scale <= precision)\u041a\u0430\u043a \u043f\u043e\u043a\u0430\u0437\u0430\u043d\u043e \u0432 \u043f\u0440\u0438\u043c\u0435\u0440\u0435 \u043d\u0438\u0436\u0435, \u0440\u0430\u0437\u0434\u0435\u043b schema
\u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u044f\u0435\u0442 \u0441\u043e\u0431\u043e\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0439 \u0441\u0445\u0435\u043c \u0441 \u0443\u043a\u0430\u0437\u0430\u043d\u0438\u0435\u043c \u0442\u0438\u043f\u043e\u0432 \u044d\u0442\u0438\u0445 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0439.
jobConfig: {\n schemas: [\n {\n id: \"schema1\"\n kind: \"delimited\"\n schema: [\n {name: \"colA\", type: \"string\"},\n {name: \"colB\", type: \"timestamp\"},\n {name: \"colC\", type: \"decimal(10, 3)\"}\n ]\n }\n {\n id: \"schema2\"\n kind: \"fixedFull\",\n schema: [\n {name: \"col1\", type: \"integer\", width: 5},\n {name: \"col2\", type: \"double\", width: 6},\n {name: \"col3\", type: \"boolean\", width: 4}\n ]\n }\n {id: \"schema3\", kind: \"fixedShort\", schema: [\"colOne:5\", \"colTwo:7\", \"colThree:9\"]}\n {id: \"hive_schema\", kind: \"hive\", schema: \"some_schema\", table: \"some_table\"}\n {id: \"avro_schema\", kind: \"avro\", schema: \"path/to/avro_schema.avsc\"}\n ]\n}\n
"},{"location":"ru/03-job-configuration/03-Sources/","title":"Source Configuration","text":"tbd
"},{"location":"ru/03-job-configuration/04-Streams/","title":"Streaming Sources Configurations","text":"tbd
"},{"location":"ru/03-job-configuration/05-VirtualSources/","title":"Virtual Sources Configuration","text":"tbd
"},{"location":"ru/03-job-configuration/06-VirtualStreams/","title":"Virtual Streaming Sources Configuration","text":"tbd
"},{"location":"ru/03-job-configuration/07-LoadChecks/","title":"Load Checks Configuration","text":"tbd
"},{"location":"ru/03-job-configuration/08-Metrics/","title":"Metrics Configuration","text":"tbd
"},{"location":"ru/03-job-configuration/09-Checks/","title":"Checks Configurations","text":"tbd
"},{"location":"ru/03-job-configuration/10-Targets/","title":"Targets Configuration","text":"tbd
"},{"location":"ru/03-job-configuration/11-FileOutputs/","title":"File Output Configuration","text":"tbd
"},{"location":"ru/03-job-configuration/12-JobConfigExample/","title":"Job Configuration Example","text":"tbd
"}]} \ No newline at end of file +{"config":{"lang":["en","ru"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"changelog/CHANGELOG/","title":"Changelog","text":""},{"location":"changelog/CHANGELOG/#141-2024-03-14","title":"1.4.1 (2024-03-14)","text":""},{"location":"changelog/CHANGELOG/#bug-fixes","title":"Bug Fixes","text":"Thank you for considering contributing to our project! We welcome contributions from everyone. By participating in this project, you agree to abide by our Code of Conduct.
Please take a moment to review our Contribution Guide in order to make the contribution process as smooth as possible.
"},{"location":"contribution/code-of-conduct/","title":"Code of Conduct","text":""},{"location":"contribution/code-of-conduct/#our-pledge","title":"Our Pledge","text":"In the interest of fostering an open and inclusive environment, we as contributors and maintainers pledge to make participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, education, socioeconomic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
"},{"location":"contribution/code-of-conduct/#our-standards","title":"Our Standards","text":"Examples of behavior that contributes to creating a positive environment include:
Examples of unacceptable behavior by participants include:
Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
"},{"location":"contribution/code-of-conduct/#enforcement","title":"Enforcement","text":"Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at GitHub. All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident.
Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
"},{"location":"contribution/code-of-conduct/#attribution","title":"Attribution","text":"This Code of Conduct is adapted from the Contributor Covenant, version 2.0, available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
"},{"location":"contribution/code-of-conduct/#scope","title":"Scope","text":"This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.
"},{"location":"contribution/code-of-conduct/#acknowledgements","title":"Acknowledgements","text":"We thank the open-source community for providing inspiration and examples for creating a welcoming and inclusive Code of Conduct. Your efforts make the tech community a better place for everyone.
"},{"location":"contribution/contribution/","title":"Contribution Guide","text":""},{"location":"contribution/contribution/#types-of-contributions","title":"Types of Contributions","text":"We value all kinds of contributions, including:
All contributions will be reviewed by the maintainers of the project. Feedback or suggestions for improvement may be provided. Once everything is approved, your contribution will be merged.
"},{"location":"contribution/contribution/#code-of-conduct","title":"Code of Conduct","text":"Please read and adhere to our Code of Conduct in all your interactions with the project.
"},{"location":"contribution/contribution/#attribution","title":"Attribution","text":"A huge thanks to all contributors who help make this project better!
If you're unsure about anything, feel free to ask for clarification. We appreciate your efforts to make our project better and look forward to your contributions!
"},{"location":"","title":"Home","text":"Latest Version: 1.4.1
To ensure quality of big data, it is necessary to perform calculations of a large number of metrics and checks on huge datasets, which in turn is a difficult task.
Checkita is a Data Quality Framework that solves this problem by formalizing and simplifying the process connecting and reading data from various sources, describing metrics and checks on data from these sources, as well as sending results and notifications via various channels.
Thus, Checkita allows calculating various metrics and checks on data (both structured, and unstructured). The framework is able to perform distributed computing on data in a \"single pass\", using Spark as a computation core. Hocon configurations are used to describe application configurations and job pipelines. Job results are saved in a dedicated framework database, and can also be sent to users via various channels such as File (to local FS, HDFS, S3), Email, Mattermost and Kafka.
Using Spark as a computation engine allows performing metrics and checks calculations at the level of \"raw\" data, without requiring any SQL abstractions over the data (such as Hive or Impala), which in turn can hide some errors in the data (e.g. bad formatting or schema mismatch).
Summarizing, Checkita is able to do following:
Checkita is designed with focus on integration into ETL pipelines and data catalogues:
Another key feature of Checkita data quality framework is that it can process both static (batch) and streaming data sources. Thus, either a batch or streaming application can be started depending on the type of sources that needs to be checked. Streaming mode is currently in experimental phase and is subjected to changes.
The framework is written in Scala 2.12 and uses Spark 2.4+ as the computation core. The project is configured with a parameterized SBT build that allows building the framework for a specific version of Spark, publish the project to a given repository, and also build Uber-jar, both with and without Spark dependencies.
License
Checkita Data Quality framework is GNU LGPL licensed.
This project is a reimagination of Data Quality Framework developed by Agile Lab, Italy.
"},{"location":"01-application-setup/","title":"General Information","text":"Checkita runs as a Spark Application. Thus, it can be run in the same way as any other Spark application:
Both application spark-submit modes are also supported: client
and cluster
.
The framework was developed primarily for batch data processing and currently supports only this mode of operation. A typical architecture for working with Checkita Data Quality is shown in the diagram below:
Also, the Data Quality Framework can be used for streaming data processing, however, this functionality is currently in experimental state and is subject to change. For more detailed information on running quality checks over streaming sources, please see Data Quality Checks over Streaming Sources chapter.
"},{"location":"01-application-setup/01-ApplicationSettings/","title":"Application Settings","text":"General Checkita Data Quality settings are configured in Hocon file application.conf
which is supplied to the application on the startup. All configurations are set within appConfig
section.
There is only one parameter that is set at the top level and this is applicationName
- name of the Spark application. This parameter is optional and if not set, then Checkita Data Quality
application name is used by default.
The rest of the parameters are defined in the subsections that are described below.
"},{"location":"01-application-setup/01-ApplicationSettings/#datetime-settings","title":"DateTime Settings","text":"DateTime configurations are set in the dateTimeOptions
section. Please, see Working with Date and Time section for more details on working with date and time in Checkita Framework.
DateTime settings include following:
timeZone
- Time zone in which string representation of reference date and execution date are parsed and rendered. Optional, default is \"UTC\"
.referenceDateFormat
- datetime format used to parse and render reference date. Optional, default is \"yyyy-MM-dd'T'HH:mm:ss.SSS\"
.executionDateFormat
- datetime format used to parse and render execution date. Optional, default is \"yyyy-MM-dd'T'HH:mm:ss.SSS\"
If dateTimeOptions
section is missing then default values are used for all parameters above.
These settings are only applicable to streaming applications and define various aspects of running data quality checks for streaming sources. Please, see Data Quality Checks over Streaming Sources section for more details on runnig data quality checks over streaming sources.
trigger
- Trigger interval: defines time interval for which micro-batches are collected. Optional, default is 10s
.window
- Window interval: defines tabbing window size used to accumulate metrics. All metrics results and checks are evaluated per each window once it finalised. Optional, default is 10m
.watermark
- Watermark level: defines time interval after which late records are no longer processed. Optional, default is 5m
.allowEmptyWindows
- Boolean flag indicating whether empty windows are allowed. Thus, in situation when window is below watermark and for some of the processed streams there are no results then all related checks will be skipped if this flag is set to true
. Otherwise, checks will be processed and will return error status with ... metric results were not found ...
type of message. Optional, default is false
.IMPORTANT All intervals must be defined as a duration string which should conform to Scala Duration format.
"},{"location":"01-application-setup/01-ApplicationSettings/#enablers","title":"Enablers","text":"Section enablers
of application configuration file defines various boolean switchers is single-value parameters that controls various aspects of data quality job execution:
allowSqlQueries
- Enables usage arbitrary SQL queries in data quality job configuration. Optional, default is false
allowNotifications
- Enables notifications to be sent from DQ application. Optional, default is false
aggregatedKafkaOutput
- Enables sending aggregates messages for Kafka Targets (one per each target type). By default, kafka messages are sent per each result entity. Optional, default is false
enableCaseSensitivity
- Enable columns case sensitivity. Controls column names comparison and lookup. Optional, default is false
errorDumpSize
- Maximum number of errors to be collected per single metric. Framework is able to collect source data rows where metric evaluation yielded some errors. But in order to prevent OOM the number of collected errors have to be limited to a reasonable value. Thus, maximum allowable number of errors per metric is 10000
. It is possible to lower this number by setting this parameter. Optional, default is 10000
outputRepartition
- Sets the number of partitions when writing outputs. By default, writes single file. Optional, default is 1
If enablers
section is missing then default values are used for all parameters above.
Parameters for connecting to Data Quality results storage are defined in storage
section of application configuration.
For more information on results storage refer to Data Quality Results Storage chapter of the documentation.
Thus, connection to storage is configured using following parameters:
dbType
- Type of database used to store Data Quality results. Required.url
- Database connection URL (without protocol identifiers). Required.username
- Username to connect to database with (if required). Optional.password
- Password to connect to database with (if required). Optional.schema
- Schema where data quality tables are located (if required). Optional.saveErrorsToStorage
- Enables metric errors to be stored in storage database. Optional, default is false
.IMPORTANT If storage
section is missing then application will run without usage of results storage:
In addition, be mindful when storing metric errors to storage database. Depending on errorDumpSize
settings, the number of collected errors could be quite large. This will load to overloading DQ storage as well as increase database write operations execution time. Another concern is related to the fact that metric errors contain data excerpts from sources being checked. These excerpts might contain some sensitive information that is rather not to be stored in DQ storage database. Alternatively, these excerpts can be encrypted before storing. See Encryption configuration for more details.
In order to send notification via email it is necessary to configure connection to SMTP server which should be defined in email
section of application configuration with following parameters:
host
- SMTP server host. Required.port
- SMTP server port. Required.address
- Email address to sent notification from. Required.name
- Name of the sender. Required.sslOnConnect
- Boolean parameter indicating whether to use SSL on connect. Optional, default is false
.tlsEnabled
- Boolean parameter indicating whether to enable TLS. Optional, default is false
.username
- Username for connection to SMTP server (if required). Optional.password
- Password for connection to SMTP server (if required). Optional.If email
section is missing then email notifications cannot be sent. If ones were configured in job configuration, then exception would be thrown at runtime.
In order to send notification to Mattermost it is necessary to configure connection to Mattermost API which should be defined in mattermost
section of application configuration with following parameters:
host
- Mattermost API host.token
- Mattermost API token (using Bot accounts for notifications is preferable).If mattermost
section is missing then corresponding notifications cannot be sent. If ones were configured in job configuration, then exception would be thrown at runtime.
It is also possible to provide list of default Spark configuration parameters used across multiple jobs. These parameters should be provided as defaultSparkOptions
list where each parameter is a string in format: spark.param.name=spark.param.value
.
When storage
section is defined, it is also recommended to use encryption
section in order to protect sensitive information in job config. This should be done by defining the parameters within the application configuration file:
secret
- Secret string used to encrypt/decrypt sensitive fields. This string should contain at least 32 characters. Required. keyFields
- List of key fields used to identify fields that requires encryption/decryption. Optional, default is [password, secret]
.encryptErrorData
- Boolean flag indicating whether it is necessary tp encrypt data excerpts within collected metric errors. Optional, default is false
If encryption
section is missing then any sensitive information will not be encrypted.
IMPORTANT Both keys of job configuration and data excerpts that metric errors contain might contain some sensitive information. Storing raw sensitive information in DQ storage database might not satisfy security requirements. Therefore, DQ framework offers functionality to encrypt sensitive data with AES256 encryption algorithm. As AES25 is a symmetric algorithm then encrypted data can be decrypted with use secret key if needed.
"},{"location":"01-application-setup/01-ApplicationSettings/#example-of-application-configuration-file","title":"Example of Application Configuration File","text":"Hocon configuration format supports variable substitution and Checkita Data Quality framework has a mechanism to feed configuration files with extra variables at runtime. For more information, see Usage of Environment Variables and Extra Variables chapter of the documentation.
appConfig: {\n\n applicationName: \"Custom Data Quality Application Name\"\n\n dateTimeOptions: {\n timeZone: \"GMT+3\"\n referenceDateFormat: \"yyyy-MM-dd\"\n executionDateFormat: \"yyyy-MM-dd-HH-mm-ss\"\n }\n\n enablers: {\n allowSqlQueries: false\n allowNotifications: true\n aggregatedKafkaOutput: true\n }\n\n defaultSparkOptions: [\n \"spark.sql.orc.enabled=true\"\n \"spark.sql.parquet.compression.codec=snappy\"\n \"spark.sql.autoBroadcastJoinThreshold=-1\"\n ]\n\n storage: {\n dbType: \"postgres\"\n url: \"localhost:5432/public\"\n username: \"postgres\"\n password: \"postgres\"\n schema: \"dqdb\"\n saveErrorsToStorage: true\n }\n\n email: {\n host: \"smtp.some-company.domain\"\n port: \"25\"\n username: \"emailUser\"\n password: \"emailPassword\"\n address: \"some.service@some-company.domain\"\n name: \"Data Quality Service\"\n sslOnConnect: true\n }\n\n mattermost: {\n host: \"https://some-team.mattermost.com\"\n token: ${dqMattermostToken}\n }\n\n encryption: {\n secret: \"secretmustbeatleastthirtytwocharacters\"\n keyFields: [\"password\", \"username\", \"url\"]\n encryptErrorData: true\n }\n}\n
"},{"location":"01-application-setup/02-ApplicationSubmit/","title":"Submitting Data Quality Application","text":"Since Checkita framework is based on Spark, it runs as an ordinary Spark application using spark-submit
command. And as any Spark application, Checkita applications can be run both locally and on a cluster (in client
or cluster
mode).
However, Checkita applications require some command line arguments to be passed on startup. These are:
-a
- Required. Path to HOCON file with application settings: application.conf
. Note, that name of the file may vary, but usually aforementioned name is used.-j
- Required. List of paths to job configuration files. Paths must be separated by commas. Hocon format supports configuration merging, therefore, it is possible to define different parts of job configuration in separate files and reuse some common configuration sections.-d
- Optional. Datetime for which the Data Quality job is being run. Date string must conform to format specified in referenceDateFormat
parameter of the application settings. If date is not provided on startup, then it will be set to application start date. This parameter is ignored when running streaming application.-l
- Optional. Flag indicating that application should be run in local mode.-s
- Optional. Flag indicating that application will be run using Shared Spark Context. In this case application will get existing context instead of creating a new one. It is also quite important not to stop it upon job completion.-m
- Optional. Flag indicating that storage database migration must be performed prior results saving.-e
- Optional. Extra variables to be added to configuration files during prior parsing. These variables can be used in configuration files, e.g. to pass secrets. Variables are provided in key-value format: \"k1=v1,k2=v2,k3=v3,...\"\"
.-v
- Optional. Application log verbosity. By default, log level is set to INFO
.There are two available applications to start:
ru.raiffeisen.checkita.apps.batch.DataQualityBatchApp
ru.raiffeisen.checkita.apps.stream.DataQualityStreamApp
The following is an example of running an application in YARN in cluster
mode. Framework storage database connection parameters are specified in application.conf
and secrets may be passed either via environment variables or via extra variables argument. For more details see Usage of Environment Variables and Extra Variables.
export DQ_APPLICATION=\"<local or remote (HDFS, S3) path to application jar>\"\nexport DQ_DEPENDENCIES=\"<local or remote (HDFS, S3) path to uber-jar with framework dependencies>\"\nexport DQ_APP_CONFIG=\"<local or remote (HDFS, S3) path to application configuration file>\"\nexport DQ_JOB_CONFIGS=\"<local or remote (HDFS, S3) paths to job configuration files separated by commas>\"\n\n# As configuration files are uploaded to driver and executors they will be located in working directories.\n# Therefore, in application arguments it is required to list just their file names:\nexport DQ_APP_CONFIG_FILE=$(basename $DQ_APP_CONFIG)\nexport DQ_JOB_CONFIG_FILES=\"<job configuration files separated by commas (only file names)>\"\nexport REFERENCE_DATE=\"2023-08-01\"\n\n# application entry point (executable class): ru.raiffeisen.checkita.apps.batch.DataQualityBatchApp\n# --name spark-submit argument has a higher priority over application name set in `application.conf`\n\nspark-submit\\\n --class ru.raiffeisen.checkita.apps.batch.DataQualityBatchApp \\\n --name \"Checkita Data Quality\" \\\n --master yarn \\\n --deploy-mode cluster \\\n --num-executors 1 \\\n --executor-memory 2g \\\n --executor-cores 4 \\\n --driver-memory 2g \\\n --jars $DQ_DEPENDENCIES \\\n --files \"$DQ_APP_CONFIG,$DQ_DQ_JOB_CONFIGS\" \\\n --conf \"spark.executor.memoryOverhead=2g\" \\\n --conf \"spark.driver.memoryOverhead=2g\" \\\n --conf \"spark.driver.maxResultSize=4g\" \\\n $DQ_APPLICATION \\\n -a $DQ_APP_CONFIG_FILE \\\n -j $DQ_JOB_CONFIG_FILES \\\n -d $REFERENCE_DATE \\\n -e \"storage_db_user=some_db_user,storage_db_password=some_db_password\"\n
"},{"location":"01-application-setup/03-ResultsStorage/","title":"Data Quality Results Storage","text":"In order to use all features of the framework, it is required to set up a results storage. Checkita can use various RDBMS as a results storage. Also, Hive can be used as a results storage and even a simple file storage is supported.
The full list of various storage types is following:
PostgreSQL
(v.9.3 and higher) - recommended database to be used as resutls storage.Oracle
MySQL
Microsoft SQL Server
SQLite
H2
Hive
File
(directory in local file system or remote one (HDFS, S3))Checkita framework support results storage schema evolution. Flyway is run under the hood to support schema migrations. Therefore, if one of the supported RDBMS is chosen for results storage then it is possible to set it up during the first run of the Data Quality job providing -m
application argument on startup. For more details on how to run Data Quality applications refer to Submitting Data Quality Application chapter.
IMPORTANT: Flyway migrations usually run either in empty database/schema or in one that was initiated with Flyway. In Checkita framework it is also possible to run migration in non-empty database/schema. In this case it is up to user to ensure that there are no conflicting table names in database/schema.
If File
type of storage is used then it is only required to provide a path to a directory/bucket, where results will be stored. Results are stored as parquet files with the same schema as for RDMS storage. No schema evolution mechanisms are provided for File
type of storage. Therefore, if results schemas would evolve later, it will be up to user to update existing results to a new structure.
IMPORTANT: There is no partitioning used for storing results as parquet files. Every job will read entire results history and overwrite it adding new ones. Therefore, using File
type of storage is not recommended for production use.
For Hive
type of storage the schema evolution mechanisms are also not available. Therefore, it is up to user to create corresponding hive tables. DDL scripts from Hive Storage Setup Scripts chapter below can be used for that.
IMPORTANT: Results hive table must be partitioned by job_id
. Job ID is chosen as partition column to support faster results fetching during computation of trend checks (used for anomaly detection in data). Hive
type of results storage works faster that File
one, since only partition for current job_id
is read and overwritten. Nevertheless, this type of storage is also not recommended for use in production where large number of jobs will be run.
There are for types of result are written in storage:
Schemas for all results types are given below.
Primary keys denotes how we keep track if unique records: generally results for the same Data Quality job that is run for the same reference date are overwritten. History of various attempts of the same job for the same reference date is not stored. It is done in order trend checks work correctly. As these checks read historical results from Data Quality storage, it is required that there will be only one set of results per Data Quality job and given reference date.
"},{"location":"01-application-setup/03-ResultsStorage/#regular-metrics-results-schema","title":"Regular Metrics Results Schema","text":"(job_id, metric_id, reference_date)
;source_id
& column_names
contain string representation of lists in format '[val1,val2,val3]'
.params
is a JSON string.(job_id, metric_id, reference_date)
;source_id
contains string representation of lists in format '[val1,val2,val3]'
.(job_id, error_hash, reference_date)
;source_id
is a JSON string;source_key_fields
is a JSON string;metric_columns
is a JSON string;row_data
is a JSON string.errorHash
is a MD5 hash string computed from values of columns metric_id
, status
, message
and row_data
NOTE Error hash is computed with use of raw value of row_data
field even if it is encrypted later.
(job_id, check_id, reference_date)
;source_id
contains string representation of lists in format '[val1,val2,val3]'
.(job_id, check_id, reference_date)
;source_id
contains string representation of lists in format '[val1,val2,val3]'
.(job_id, reference_date)
;version_info
is a JSON string;config
is a JSON string.Below is a HiveQL script that can be used to set up Hive results storage:
-- REPLACE <schema_name> and <schema_dir> with actual name and path:\nset hivevar:schema_name=<schema_name>;\nset hivevar:schema_dir=<schema_path>;\n\nCREATE SCHEMA IF NOT EXISTS ${schema_name};\n\nDROP TABLE IF EXISTS ${schema_name}.results_metric_regular;\nCREATE EXTERNAL TABLE ${schema_name}.results_metric_regular\n(\n job_id STRING COMMENT '',\n metric_id STRING COMMENT '',\n metric_name STRING COMMENT '',\n description STRING COMMENT '',\n metadata STRING COMMENT '',\n source_id STRING COMMENT '',\n column_names STRING COMMENT '',\n params STRING COMMENT '',\n result DOUBLE COMMENT '',\n additional_result STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT ''\n)\nCOMMENT 'Data Quality Regular Metrics Results'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/results_metric_regular';\n\nDROP TABLE IF EXISTS ${schema_name}.results_metric_composed;\nCREATE EXTERNAL TABLE ${schema_name}.results_metric_composed\n(\n job_id STRING COMMENT '',\n metric_id STRING COMMENT '',\n metric_name STRING COMMENT '',\n description STRING COMMENT '',\n metadata STRING COMMENT '',\n source_id STRING COMMENT '',\n formula STRING COMMENT '',\n result DOUBLE COMMENT '',\n additional_result STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT ''\n)\nCOMMENT 'Data Quality Composed Metrics Results'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/results_metric_composed';\n\nDROP TABLE IF EXISTS ${schema_name}.results_metric_error;\nCREATE EXTERNAL TABLE ${schema_name}.results_metric_error\n(\n job_id STRING COMMENT '',\n metric_id STRING COMMENT '',\n source_id STRING COMMENT '',\n source_key_fields STRING COMMENT '',\n metric_columns STRING COMMENT '',\n status STRING COMMENT '',\n message STRING COMMENT '',\n row_data STRING COMMENT '',\n error_hash STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT '',\n)\nCOMMENT 'Data Quality Metrics Error Results'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/results_metric_error';\n\nDROP TABLE IF EXISTS ${schema_name}.results_check_load;\nCREATE EXTERNAL TABLE ${schema_name}.results_check_load\n(\n job_id STRING COMMENT '',\n check_id STRING COMMENT '',\n check_name STRING COMMENT '',\n description STRING COMMENT '',\n metadata STRING COMMENT '',\n source_id STRING COMMENT '',\n expected STRING COMMENT '',\n status STRING COMMENT '',\n message STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT ''\n)\nCOMMENT 'Data Quality Load Checks Results'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/results_check_load';\n\nDROP TABLE IF EXISTS ${schema_name}.results_check;\nCREATE EXTERNAL TABLE ${schema_name}.results_check\n(\n job_id STRING COMMENT '',\n check_id STRING COMMENT '',\n check_name STRING COMMENT '',\n description STRING COMMENT '',\n metadata STRING COMMENT '',\n source_id STRING COMMENT '',\n base_metric STRING COMMENT '',\n compared_metric STRING COMMENT '',\n compared_threshold DOUBLE COMMENT '',\n lower_bound DOUBLE COMMENT '',\n upper_bound DOUBLE COMMENT '',\n status STRING COMMENT '',\n message STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT ''\n)\nCOMMENT 'Data Quality Checks Results'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/results_check';\n\nDROP TABLE IF EXISTS ${schema_name}.job_state;\nCREATE EXTERNAL TABLE ${schema_name}.job_state\n(\n job_id STRING COMMENT '',\n config STRING COMMENT '',\n version_info STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT ''\n)\nCOMMENT 'Data Quality Job State'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/job_state';\n
"},{"location":"02-general-concepts/","title":"General Concepts","text":"In this section various aspects of working with Checkita Data Quality framework are explained.
"},{"location":"02-general-concepts/01-WorkingWithDateTime/","title":"Working with Date and Time","text":"There are two type of datetime instances used in order to identify various Data Quality job runs. These are:
referenceDate
- identifies date for which the job is run. This datetime usually indicates for which period data is read and checked.executionDate
- stores actual application start datetime and used to indicate when exactly data quality job is run.Typical case is when we run some ETL pipeline after \"closure of business\", e.g. at midnight. Thus, the referenceDate
will refer to a previous day, while executionDate
will have value of actual start of data quality job. It is likely that we would like to represent these values differently. Thus, in application configuration we can configure different formats for referenceDate
and executionDate
As referenceDate
can point to a date in the past, then it is allowed to explicitly provide its values on application startup. If value of referenceDate
is not provided, then it is set to datetime of actual start of data quality job. See Submitting Data Quality Application chapter for more information on application startup arguments.
Both of these datetime instances are widely used across framework. Thus, whenever string representation of them is required, it is obtained using datetime parameters set in the application configuration file.
It also should be noted, that datetime rendering is performed with respect to timezone in which the application is running. Timezone is also set in application configuration file. The UTC
time zone is used by default.
The last but not least: we avoid using datetime string representation when storing results into storage database. Both referenceDate
and executionDate
are converted to timestamp at UTC
timezone, instead. This ensures stable results querying from storage independent on datetime configuration parameters. See Data Quality Results Storage chapter for more information on results storage.
IMPORTANT: Actual string representation of referenceDate
and exectionDate
are always added to configuration files as extra variables. For more details on extra variables usage in configuration files, see Usage of Environment Variables and Extra Variables chapter.
Hocon configuration format supports variable substitution. This mechanism allows more flexible management of both application configuration and job configuration.
Thus, configurations files are feed with extra variables that are read from system and JVM environment and can also be explicitly defined at application startup.
For more information on how to explicitly define extra variables on startup, see Submitting Data Quality Application chapter of the documentation.
In order to use system or JVM environment variables their names must match following regex expression: ^(?i)(DQ)[a-z0-9_-]+$
, e.g. DQ_STORAGE_PASSOWRD
or dqMattermostToken
. All environment variables that match this regex expression will be retrieved and available for substitution in both application and job configuration files.
Typical use case for variable substitution is to provide secrets for connection to external systems. It is not a good idea to store such information in configuration files and, therefore, there must be a mechanism to provide it at runtime.
IMPORTANT: Variables are added to configuration files at runtime and are not stored in any form.
"},{"location":"02-general-concepts/03-StatusModel/","title":"Status Model used in Results","text":"Unified status model is used for results that Checkita framework produces. Thus, all metrics and check results have common status indication that is following:
Success
- Evaluation of metric or check completed without any errors and metric or check condition is met.Failure
- Evaluation of metric or check yielded results that do not meet configured condition, e.g.:Error
- Caught runtime error during metric or check evaluation. Runtime error message is caught as well.Result status is always accompanied by message, that describes this status. What not common between metrics and checks is how statuses are communicated with user:
Success
then metric error is collected for this particular row of data. Then, metric error reports can be requested as Error Collection Targets. For more information on metric error collection, see Metric Error Collection chapter.Metric calculation involves reading data row by row and incrementing metric value for each row. During increment step there could be something wrong: either due to problems with data or due to some unexpected runtime errors. In addition, some metrics have logical condition that needs to be met in order to increment the metric value. Failing to satisfy this condition is also considered as failure.
Thus, in the situations, described above, there will be error collection mechanism triggered and following error data or failure data collected:
Failure
or Error
) and message.Since the processed source can be extremely large and, subsequently, can yield large amount of metric errors then out-of-memory errors are likely to happen. In order to prevent that, the number of errors collected per each metric is limited. Thus, maximum number of errors collected per metric cannot be more than 10000
. This number can be additionally limited in the application settings by setting errorDumpSize
parameter to a lower number. See Enablers chapter for more details.
Collected metric errors could be used to identify and debug problems in the data. In order to save or send metric error reports, Error Collection Targets can be configured in targets
section of job configuration. Note that error collection reports will contain excerpts from data and, therefore, should be communicated with caution. For the same reason they are never saved in Data Quality storage.
IMPORTANT Functionality of performing data quality checks over streaming sources is currently in experimental state and is subjected to changes.
As it has already been stated Checkita Data Quality framework has ability to calculate metrics and perform quality checks over streaming data sources. As Spark is used as a computation engine, then Spark Structured Streaming API is used to run metric calculations over streaming data sources.
The core idea of running data quality job in streaming mode is to retain the ability to process multiple data sources at the same time. As metrics calculation is a stateful operation then all streaming sources are processed per tabbing windows. In order to process multiple sources simultaneously, their windows must be synchronized: (1) be of the same size and (2) starting at the same time. Therefore, window size is set at the application level and is used for all processed streaming source.
As streaming sources are processed per each window, then it is crucial to provide time value used to assign record to a particular window. Following options are supported:
Processing time
- Spark builds time value for each record using current_timestamp
function.Event time
- Mostly applicable to kafka topics: time value is obtained from timestamp
column which correspond to message creation time (a.k.a. event time).Custom time
- Uses user-defined column of timestamp type that is used to provide time value for window assignment.Another thing to care about is how to finalize windows state. In other words, it is required to establish rules on when we can consider window state is final and assume that no new records will arrive to this window. Common approach to resolve this problem in streaming processing is to use \"watermarks\". Watermark holds a time value which sets a level to accept new records for processing. If record's time is below the watermark level, then it is considered to be \"late\" and is not processed. Watermark is defined as maximum observed record time minus predefined offset. For more details see Spark documentation: Handling Late Data and Watermarking. For purpose of synchronous processing of multiple streaming sources the watermark offset is the same for all sources and is set at the application level.
Finally, it should be noted that Spark Structured Steaming engine processes streaming sources in micro-batches. Thus, records are collected for some short-termed interval and processed as a static dataframe (micro-batch). Spark allows us to control time interval for which micro-batches are collected by setting trigger
interval. This interval must also be the same for all streams and is set at application level. Adjusting trigger interval allows us to control size of micro-batches and thus to control executors load.
Thus, for more information on streaming configuration settings, please see Streaming Settings chapter. Summarizing, data quality streaming job processing routing consists of following stages:
forEachBatch
sink.For each micro-batch (evaluated once per trigger interval) process data:
register metric error accumulator;
update processor buffer state, which contains state of metric calculators for all windows as well as collected metric errors (also per each window). In addition, processor buffer tracks current watermark levels per each processed streaming source.
Window processor checks processor buffer (also once per trigger interval) for windows that are completely below the watermark level. IMPORTANT In order to support synchronised processing of multiple streaming sources, the minimum watermark level is used (computed from current watermark levels of all the processed sources). This ensures that window is finalised for all processed sources.
Once finalised window is obtained, then for this window all data quality routines are performed:
metric results are retrieved from calculators;
processor buffer is cleared: state for processed window is removed.
Streaming queries and window processor run until application is stopped (sigterm
signal received) or error occurs.
Important note on results saving: since set of results is generated per each processed window than for each set of results reference datetime and execution datetime is set to a corresponding window start time. For more details on working with datetime in Checkita framework, please see Working with Date and Time chapter.
TIP Since data quality checks are performed for each window, then windows size should rather be large, in order to produce results at such time interval which allows reviewing any occurred data quality issued and take some measures to resolve them. Thus, if your engineering team has a \"reaction time\" of 1 hour then it is quite unreasonable to perform quality checks over streaming source with 10-minutes window.
"},{"location":"03-job-configuration/","title":"Job Configuration","text":"Data Quality job in Checkita is a sequence of tasks that need to be performed in order to check quality of data. These tasks may include following:
All the aforementioned tasks are configured in one or multiple Hocon configuration files. All job configurations are set within jobConfig
section of the configuration files.
There is only one parameter that is set at the top level and this is jobId
- ID of the job to be run. This parameter is mandatory for any job configuration. Thus, jobId
usually unites calculation of various metrics and checks that are performed over the sources within single schema, data-mart or other logical formation of data sources.
The rest of the parameters are defined in the subsections that are described in a separate chapters of this documentation:
Example of fully filled job configuration can be found in Job Configuration Example chapter of this documentation.
"},{"location":"03-job-configuration/01-Connections/","title":"Connections Configuration","text":"Checkita framework allows creation of data sources based on data from external systems such as RDBMS or message queues like Kafka. In order to read data from external systems it is required to establish a connection in a first place.
Thus, connections are described in connections
section of job configuration. Currently, connection to following systems are supported:
All connections are defined with following common parameters:
id
- Connection ID that uniquely identifies its configuration;description
- Optional connection description;parameters
- Optional list of additional Spark parameters that can be specified to provide some extra configuration required by Spark to read data from a particular system.metadata
- Optional list of arbitrary user-defined metadata parameters.Example of connections
section of job configuration is shown in Connections Configuration Example below.
Configuring connection to SQLite database is quite easy. In addition to common parameters it is required to supply only a path to database file:
id
- Required. Connection ID;description
- Optional. Connection description;url
- Required. Path to SQLite database file.parameters
- Optional. List of Spark parameters if required where each parameter is a string in format: spark.param.name=spark.param.value
.metadata
- Optional. List of user-defined metadata parameters specific to this connection where each parameter is a string in format: param.name=param.value
.Configuration to PostgreSQL can be set up using following parameters:
id
- Required. Connection ID;description
- Optional. Connection description;url
- Required. Connection URL. Should contain host, port and name of database. In addition, extra parameters can be supplied in connection URL if required. Connection protocol must not be specified.username
- Optional. Username used to connect to PostgreSQL database if required.password
- Optional. Password used to connect to PostgreSQL database if required.parameters
- Optional. List of Spark parameters if required where each parameter is a string in format: spark.param.name=spark.param.value
.metadata
- Optional. List of user-defined metadata parameters specific to this connection where each parameter is a string in format: param.name=param.value
.Configuration to Oracle can be set up in the same way as to PostgreSQL, using following parameters:
id
- Required. Connection ID;description
- Optional. Connection description;url
- Required. Connection URL. Should contain host, port and name of database. In addition, extra parameters can be supplied in connection URL if required. Connection protocol must not be specified.username
- Optional. Username used to connect to Oracle database if required.password
- Optional. Password used to connect to Oracle database if required.parameters
- Optional. List of Spark parameters if required where each parameter is a string in format: spark.param.name=spark.param.value
.metadata
- Optional. List of user-defined metadata parameters specific to this connection where each parameter is a string in format: param.name=param.value
.Configuration to MySQL can be set up in the same way as to PostgreSQL and Oracle, using following parameters:
id
- Required. Connection ID;description
- Optional. Connection description;url
- Required. Connection URL. Should contain host, port and name of database. In addition, extra parameters can be supplied in connection URL if required. Connection protocol must not be specified.username
- Optional. Username used to connect to MySQL database if required.password
- Optional. Password used to connect to MySQL database if required.parameters
- Optional. List of Spark parameters if required where each parameter is a string in format: spark.param.name=spark.param.value
.metadata
- Optional. List of user-defined metadata parameters specific to this connection where each parameter is a string in format: param.name=param.value
.Configuration to MS SQL can be set up similarly to the 3 previous, using following parameters:
id
- Required. Connection ID;description
- Optional. Connection description;url
- Required. Connection URL. Should contain host, port and name of database. In addition, extra parameters can be supplied in connection URL if required. Connection protocol must not be specified.username
- Optional. Username used to connect to MS SQL database if required.password
- Optional. Password used to connect to MS SQL database if required.parameters
- Optional. List of Spark parameters if required where each parameter is a string in format: spark.param.name=spark.param.value
.metadata
- Optional. List of user-defined metadata parameters specific to this connection where each parameter is a string in format: param.name=param.value
.Configuring connection to H2 database has similarly to SQLite. It is required supplying only two parameters:
id
- Required. Connection ID;description
- Optional. Connection description;url
- Required. Connection URL. Should contain host, port and name of database. In addition, extra parameters can be supplied in connection URL if required. Connection protocol must not be specified.parameters
- Optional. List of Spark parameters if required where each parameter is a string in format: spark.param.name=spark.param.value
.metadata
- Optional. List of user-defined metadata parameters specific to this connection where each parameter is a string in format: param.name=param.value
.Configuration to ClickHouse can be set up in the same way as to MS SQL, using following parameters:
id
- Required. Connection ID;description
- Optional. Connection description;url
- Required. Connection URL. Should contain host, port and name of database. In addition, extra parameters can be supplied in connection URL if required. Connection protocol must not be specified.username
- Optional. Username used to connect to ClickHouse database if required.password
- Optional. Password used to connect to ClickHouse database if required.parameters
- Optional. List of Spark parameters if required where each parameter is a string in format: spark.param.name=spark.param.value
.metadata
- Optional. List of user-defined metadata parameters specific to this connection where each parameter is a string in format: param.name=param.value
.In order to connect to set up connection to Kafka brokers, it is required to supply following parameters:
id
- Required. Connection ID;description
- Optional. Connection description;servers
- Required. List of broker servers to connect to.parameters
- Optional. List of Spark parameters if required where each parameter is a string in format: spark.param.name=spark.param.value
. Usually, Kafka authorisation settings are provided by means of spark parameters.metadata
- Optional. List of user-defined metadata parameters specific to this connection where each parameter is a string in format: param.name=param.value
.If connection to Kafka cluster requires JAAS configuration file, then it should be provided via Java environment variables. Note, that these variables must be declared prior JVM starts, therefore, they must be set in spark-submit
command as follows:
cluster
mode: --deploy-mode cluster \\\n--conf 'spark.driver.extraJavaOptions=\"-Djava.security.auth.login.config=./jaas.conf\"' \\\n--conf 'spark.executor.extraJavaOptions=\"-Djava.security.auth.login.config=./jaas.conf\"' \\\n--files /path/to/your/jaas.conf,<other files required for DQ>\n
client
mode the driver JVM starts on client prior Spark configuration is read, therefore, Java environment variables for driver must be set in advance using --driver-java-options
argument: --deploy-mode client \\\n--driver-java-options \"-Djava.security.auth.login.config=.jaas.conf\" \\\n--conf 'spark.executor.extraJavaOptions=\"-Djava.security.auth.login.config=./jaas.conf\"' \\\n--files file.keytab,jaas.conf,<other files required for DQ>\n
Configuring connection to Greenplum, you must specify the following parameters:
id
- Required. Connection ID;description
- Optional. Connection description;url
- Required. Connection URL. Should contain host, port and name of database. In addition, extra parameters can be supplied in connection URL if required. Connection protocol must not be specified.username
- Optional. Username used to connect to Greenplum database if required.password
- Optional. Password used to connect to Greenplum database if required.schema
- Optional. schema to lookup tables from. If omitted, default schema is used.parameters
- Optional. List of Spark parameters if required where each parameter is a string in format: spark.param.name=spark.param.value
.metadata
- Optional. List of user-defined metadata parameters specific to this connection where each parameter is a string in format: param.name=param.value
.Pivotal connector is not published in public repositories such as Maven Central. Therefore, this dependency is unmanaged and should be manually added to Spark application during submit (using spark.jars configuration parameter). Connector jar-file can be downloaded from official Pivotal releases.
"},{"location":"03-job-configuration/01-Connections/#connections-configuration-example","title":"Connections Configuration Example","text":"As it is shown in the example below, connections of the same type are grouped within subsections named after the type of connection. These subsections should contain a list of connection configurations of the corresponding type.
jobConfig: {\n connections: {\n postgres: [\n {\n id: \"postgre_db1\",\n description: \"Connection to production instance of DB\"\n url: \"postgre1.db.com:5432/public\", \n username: \"dq-user\", \n password: \"dq-password\",\n metadata: [\n \"db.owner=some.user@some.domain\",\n \"environment=prod\"\n ]\n }\n {\n id: \"postgre_db2\",\n description: \"Connection to test instance of DB\"\n url: \"postgre2.db.com:5432/public\",\n username: \"dq-user\",\n password: \"dq-password\",\n schema: \"dataquality\",\n metadata: [\n \"db.owner=some.user@some.domain\",\n \"environment=test\"\n ]\n }\n ]\n oracle: [\n {id: \"oracle_db1\", url: \"oracle.db.com:1521/public\", username: \"db-user\", password: \"dq-password\"}\n ]\n sqlite: [\n {id: \"sqlite_db\", url: \"some/path/to/db.sqlite\"}\n ],\n mysql: [\n {id: \"mysql_db1\", url: \"mysql.db.com:8306/public\", username: \"user\", password: \"pass\"}\n ],\n mssql: [\n {id: \"mssql_db1\", url: \"mssql.db.com:8433\", username: \"user\", password: \"pass\"}\n ],\n h2: [\n {id: \"h2_db1\", url: \"h2.db.com:9092/default\", username: \"user\", password: \"pass\"}\n ],\n clickhouse: [\n {id: \"clickhouse_db1\", url: \"clickhouse.db.com:8123\", username: \"user\", password: \"pass\"}\n ],\n kafka: [\n {id: \"kafka_cluster_1\", servers: [\"server1:9092\", \"server2:9092\"]}\n {\n id: \"kafka_cluster_2\",\n servers: [\"kafka-broker1:9092\", \"kafka-broker2:9092\", \"kafka-broker3:9092\"]\n parameters: [\n \"security.protocol=SASL_PLAINTEXT\",\n \"sasl.mechanism=GSSAPI\",\n \"sasl.kerberos.service.name=kafka-service\"\n ]\n }\n ],\n greenplum: [\n {\n id: \"greenplum_db1\", \n url: \"greenplum.db.com:5432/postgres\", \n username: \"user\", \n password: \"pass\",\n schema: \"public\"\n }\n ]\n }\n}\n
"},{"location":"03-job-configuration/02-Schemas/","title":"Schemas Configuration","text":"Schemas are used in Data Quality jobs for two purposes:
schemaMatch
load checks. See Schema Match Check.Schemas are set in schemas
section of job configuration and can be defined in different formats as described below. Format in which schema is defined is set in kind
field and defines what other fields are need to be provided.
Apart from kind
field, all types of schemas configuration contain following common parameters:
id
- Schema ID that uniquely identifies its configuration;description
- Optional schema description;metadata
- Optional list of arbitrary user-defined metadata parameters.This kind of schema definition is primarily used to provide schemas for delimited text files such as CSV or TSV. Nevertheless, these schemas can be used for schemaMatch
load checks as well. Using this type of configuration, only flat schemas can be defined (nested columns are not allowed).
Thus, delimited definition contains following parameters:
kind: \"delimited\"
- Required. Sets delimited schema definition format.id
- Required. Schema ID;description
- Optional. Schema description;schema
- Required. List of schema columns where each column is an object with following fields:name
- Required. Name of the column;type
- Required. Type of the column. See Supported Type Literals for allowed types.metadata
- Optional. List of user-defined metadata parameters specific to this schema where each parameter is a string in format:param.name=param.value
.Fixed-full kind of schema definition is used to provide schemas for read fixed-width text files. The key difference from other schema definitions is that columns widths are also provided which is crucial information for parsing fixed-width files. This kind of schemas may also be used for reading delimited files and for reference in schemaMatch
load checks. Using this type of configuration, only flat schemas can be defined (nested columns are not allowed).
Fixed-fill schema definition contains following parameters:
kind: \"fixedFull\"
- Required. Sets fixed-full schema definition format.id
- Required. Schema ID;description
- Optional. Schema description;schema
- Required. List of schema columns where each column is an object with following fields:name
- Required. Name of the column;type
- Required. Type of the column. See Supported Type Literals for allowed types.width
- Required. Integer width of column (number of symbols).metadata
- Optional. List of user-defined metadata parameters specific to this schema where each parameter is a string in format:param.name=param.value
.Fixed-short kind of schema definition provides a more compact syntax for defining schemas used for reading fixed-width files. The columns are defined by their name and width only. Subsequently, all columns will have StringType. This kind of schemas may also be used for reading delimited files and for reference in schemaMatch
load checks. Using this type of configuration, only flat schemas can be defined (nested columns are not allowed).
Fixed-short schema definition contains following parameters:
kind: \"fixedShort\"
- Required. Sets fixed-short schema definition format.id
- Required. Schema ID;description
- Optional. Schema description;schema
- Required. List of schema columns where each column is a string in format columnName:columnWidth
. Type of columns is always a StringType.metadata
- Optional. List of user-defined metadata parameters specific to this schema where each parameter is a string in format:param.name=param.value
.Avro kind of schema configuration is used to read schema from file with avro schema .avsc
. Thus, schema read from avro schema file can be used to read both, avro files and delimited text files as well as be used as reference in schemaMatch
load checks. In addition, avro schema format supports complex schemas with nested columns.
In order to read schema from avro file it is required to supply following parameters:
kind: \"avro\"
- Required. Sets avro schema definition format.id
- Required. Schema ID;description
- Optional. Schema description;schema
- Required. Path to avro schema file .avsc
to read schema from.Hive catalogue can be used as a source of schemas. Hive kind of schema definition is intended to retrieve schemas from hive tables. These schemas can be used to read both, avro files and delimited text files as well as be used as reference in schemaMatch
load checks.
To retrieve schema from hive table it is required to set up following parameters:
kind: \"hive\"
- Required. Sets hive schema definition format.id
- Required. Schema ID;description
- Optional. Schema description;schema
- Required. Hive schema to search for a table.table
- Required. Hive table to retrieve schema from.excludeColumns
- Optional. List of column names to exclude from schema. Sometimes it is required, e.g. to exclude partition columns from schema.metadata
- Optional. List of user-defined metadata parameters specific to this schema where each parameter is a string in format: param.name=param.value
.The following type literals are supported when defining schema columns in job configuration file:
string
boolean
date
timestamp
integer (32-bit integer)
long (64-bit integer)
short (16-bit integer)
byte (signed integer in a single byte)
double
float
decimal(precision, scale)
(precision <= 38; scale <= precision)As it is shown in the example below, schemas
section represent a list of schema definitions of various kinds.
jobConfig: {\n schemas: [\n {\n id: \"schema1\"\n kind: \"delimited\"\n description: \"Schema describing content of CSV file\"\n schema: [\n {name: \"colA\", type: \"string\"},\n {name: \"colB\", type: \"timestamp\"},\n {name: \"colC\", type: \"decimal(10, 3)\"}\n ]\n }\n {\n id: \"schema2\"\n kind: \"fixedFull\",\n schema: [\n {name: \"col1\", type: \"integer\", width: 5},\n {name: \"col2\", type: \"double\", width: 6},\n {name: \"col3\", type: \"boolean\", width: 4}\n ]\n }\n {id: \"schema3\", kind: \"fixedShort\", schema: [\"colOne:5\", \"colTwo:7\", \"colThree:9\"]}\n {id: \"hive_schema\", kind: \"hive\", schema: \"some_schema\", table: \"some_table\"}\n {\n id: \"avro_schema\", \n kind: \"avro\", \n schema: \"path/to/avro_schema.avsc\"\n metadata: [\n \"schema.origin=http://some-schema-registry-location\"\n ]\n }\n ]\n}\n
"},{"location":"03-job-configuration/03-Sources/","title":"Sources Configuration","text":"Reading sources is one of the major part of Data Quality job. During job execution, Checkita will read all sources into a Spark DataFrames, that will be later processed to calculate metrics and perform quality checks. In addition, dataframes' metadata is used to perform all types of load checks in order to ensure that source has the structure as expected.
Generally, sources can be read from file systems or object storage that Spark is connected to such as HDFS or S3. In additional, table-like source from Hive catalogue can be read. Apart from integrations natively supported by Spark, Checkita can read sources from external systems such as RDBMS or Kafka. For this purpose it is required to define connections to these systems in a first place. See Connections Configuration chapter for more details on connections configurations.
Thus, currently Checkita supports four general types of sources:
All sources must be defined in sources
section of job configuration. More details on how to configure sources of each of these types are shown below. Example of sources
section of job configuration is shown in Sources Configuration Example below.
Currently, there are five file types that Checkita can read as a source. These are:
When configuring file source, it is mandatory to indicate its type. Subsequently, configuration parameters may vary for files of different types.
Common parameters for sources of any file type are:
id
- Required. Source ID;description
- Optional. Source description;kind
- Required. File type. Can be one of the following: fixed
, delimited
, orc
, parquet
, avro
;path
- Required. File path. Can be a path to a directory or a S3-bucket. In this case all files from this directory/bucket will be read (assuming they all have the same schema). Note, that when reading from file system which is not spark default file system, it is required to add FS prefix to the path, e.g. file://
to read from local FS, or s3a://
to read from S3.keyFields
- Optional. List of columns that form a Primary Key or are used to identify row within a dataset. Key fields are primarily used in error collection reports. For more details on error collection, see Metric Error Collection chapter.metadata
- Optional. List of user-defined metadata parameters specific to this source where each parameter is a string in format: param.name=param.value
.In order to read fixed-width file it is additionally required to provide ID of the schema used to parse file content. Schema itself should be defined in schemas
section of job configuration as described in Schemas Configuration chapter.
schema
- Required. Schema ID used to parse fixed-width file. The schema definition type should be either fixedFull
or fixedShort
When reading delimited text file, its schema may be inferred from file header if it is presented in the file or may be explicitly defined in schemas
section of job configuration file as described in Schemas Configuration chapter.
Thus, additional parameters for configuring delimited file source are:
schema
- Optional. Schema ID used to parse delimited file text file. It is possible to use schema of any definition type as long as it has flat structure (nested columns are not supported for delimited text files).header
- Optional, default is false
. Boolean parameter indicating whether schema should be inferred from file header.delimiter
- Optional, default is ,
. Column delimiter.quote
- Optional, default is \"
. Column enclosing character.escape
- Optional, default is \\
. Escape character.IMPORTANT: If the header
parameter is absent or set tofalse
, then schema
parameter must be set. And vice versa, if header
parameter is set to true
, then schema
parameter must not be set. In other words, schema may be inferred from file header or be explicitly defined, but not both.
Avro files can contain schema in its header. Therefore, there are two options to read avro files: either infer schema from file or provide it explicitly. In the second case, schema must be defined in schemas
section of job configuration file as described in Schemas Configuration chapter. Therefore, there is only one additional parameter for avro file source configuration:
schema
- Optional. Schema ID used to read avro file. It is possible to use schema of any definition type.As ORC format contains schema within itself, then there are no additional parameters required to read ORC files.
"},{"location":"03-job-configuration/03-Sources/#parquet-file-sources","title":"Parquet File Sources","text":"As Parquet format contains schema within itself, then there are no additional parameters required to read Parquet files.
"},{"location":"03-job-configuration/03-Sources/#hive-sources-configuration","title":"Hive Sources Configuration","text":"In order to read data from Hive table it is required to provide following:
id
- Required. Source ID;description
- Optional. Source description;schema
- Required. Hive schema.table
- Required. Hive table.partitions
- Optional. List of partitions to read where each element is an object with following fields. If partitions are not set then entire table is read.name
- Required. Partition column nameexpr
- Optional. SQL expression used to filter partitions to read. This SQL expression must contain only reference to partition column that is being filtered (one that is defined in name
field). References to other columns are not allowed as well as any SQL sub-queries. It is allowed to use all types of SQL functions and literals. IMPORTANT: If parameterless function is used, it should be called with empty parentheses, e.g.: current_date()
values
- Optional. List of partition column name values to read. IMPORTANT: When defining partitions to read, it is required to specify either an SQL expression to filter partitions or an explicit list of partition values but not both.
keyFields
- Optional. List of columns that form a Primary Key or are used to identify row within a dataset. Key fields are primarily used in error collection reports. For more details on error collection, see Metric Error Collection chapter.metadata
- Optional. List of user-defined metadata parameters specific to this source where each parameter is a string in format: param.name=param.value
.Table source are read from supported RDBMS via JDBC connection. There are two options to read data from RDBMS:
In order to set up table source, it is required to supply following parameters:
id
- Required. Source ID;description
- Optional. Source description;connection
- Required. Connection ID to use for table source. Connection ID must refer to connection configuration for one of the supported RDBMS. See Connections Configuration chapter for more information.table
- Optional. Table to read.query
- Optional. Query to execute. Query result is read as table source.keyFields
- Optional. List of columns that form a Primary Key or are used to identify row within a dataset. Key fields are primarily used in error collection reports. For more details on error collection, see Metric Error Collection chapter.metadata
- Optional. List of user-defined metadata parameters specific to this source where each parameter is a string in format: param.name=param.value
.IMPORTANT: Either table
to read from must be specified or query
to execute, but not both. In addition, using queries is only allowed when allowSqlQueries
is set to true. Otherwise, any usage of arbitrary SQL queries will not be permitted. See Enablers chapter for more information.
TIP: HOCON format supports multiline string values. In order to define such a value, it is required to enclose string in triple quotes, e.g.:
multilineString: \"\"\"\n SELECT * from schema.table\n WHERE load_date = '2023-08-23';\n\"\"\"\n
"},{"location":"03-job-configuration/03-Sources/#kafka-sources-configuration","title":"Kafka Sources Configuration","text":"Despite, it is not common situation to read messages from Kafka topics in batch-mode, such feature is presented in Checkita framework. In order to set up source that reads from Kafka topic/s, it is required to provide following parameters:
id
- Required. Source ID;description
- Optional. Source description;connection
- Required. Connection ID to use for kafka source. Connection ID must refer to Kafka connection configuration. See Connections Configuration chapter for more information.topics
- Optional. List of topics to read. Topics can be specified in either of two formats:[\"topic1\", \"topic2\"]
;[\"topic1@[0, 1]\", \"topic2@[2, 4]\"]
topicPattern
- Optional. Topic pattern name: read all topics that match pattern.startingOffsets
- Optional, default is earliest
. Json string setting starting offsets to read from topic. By default, all topic is read.endingOffsets
- Optional, default is latest
. Json string setting ending offset until which to read from topic. By default, read topic till the end.keyFormat
- Optional, default is string
. Format used to decode message key.valueFormat
- Optional, default is string
. Format used to decode message value.keySchema
- Schema ID used to parse message key. If key format other than string
then schema must be provided.valueSchema
- Schema ID used to parse message value. If value format other than string
then schema must be provided.options
- Optional. Additional Spark parameters related to reading messages from Kafka topics such as: failOnDataLoss, kafkaConsumer.pollTimeoutMs, fetchOffset.numRetries, fetchOffset.retryIntervalMs, maxOffsetsPerTrigger
. Parameters are provided as a strings in format of parameterName=parameterValue
. For more information, see Spark Kafka Integration Guide.keyFields
- Optional. List of columns that form a Primary Key or are used to identify row within a dataset. Key fields are primarily used in error collection reports. For more details on error collection, see Metric Error Collection chapter.metadata
- Optional. List of user-defined metadata parameters specific to this source where each parameter is a string in format: param.name=param.value
.Currently, string
, xml
, json
and avro
formats are supported to decode message key and value.
TIP: In order to define JSON strings, they must be enclosed in triple quotes: \"\"\"{\"name1\": {\"name2\": \"value2\", \"name3\": \"value3\"\"}}\"\"\"
.
In order to read data from Greenplum table using pivotal connector it is required to provide following:
id
- Required. Source ID;description
- Optional. Source description;connection
- Required. Connection ID to use for table source. Connection ID must refer to Greenplum pivotal connection. See Connections Configuration chapter for more information.table
- Optional. Table to read.keyFields
- Optional. List of columns that form a Primary Key or are used to identify row within a dataset. Key fields are primarily used in error collection reports. For more details on error collection, see Metric Error Collection chapter.metadata
- Optional. List of user-defined metadata parameters specific to this source where each parameter is a string in format: param.name=param.value
.Custom sources can be used in cases when it is required to read data from the source type that is not explicitly supported (by one of the configuration described above). In order to configure a custom source, it is required to provide following parameters:
id
- Required. Source ID;description
- Optional. Source description;format
- Required. Spark DataFrame reader format that is used to read from the given source.path
- Optional. Path to read data from (if required).schema
- Optional. Explicit schema to be applied to data from the given source (if required).options
- Optional. Additional Spark parameters used to read data from the given source.keyFields
- Optional. List of columns that form a Primary Key or are used to identify row within a dataset. Key fields are primarily used in error collection reports. For more details on error collection, see Metric Error Collection chapter.metadata
- Optional. List of user-defined metadata parameters specific to this source where each parameter is a string in format: param.name=param.value
.After parameters above are defined then spark DataFrame reader is set up to read data from the source as follows:
val df = spark.read.format(format).schema(schema).options(options).load(path)\n
If any of the optional parameters is missing than corresponding Spark reader configuration is not set.
"},{"location":"03-job-configuration/03-Sources/#sources-configuration-example","title":"Sources Configuration Example","text":"As it is shown in the example below, sources of the same type are grouped within subsections named after the type of the source. These subsections should contain a list of source configurations of the corresponding type.
sources: {\n file: [\n {id: \"hdfs_fixed_file\", kind: \"fixed\", path: \"path/to/fixed/file.txt\", schema: \"schema2\"}\n {\n id: \"hdfs_delimited_source\",\n description: \"Reading static data from CSV file\"\n kind: \"delimited\",\n path: \"path/to/csv/file.csv\"\n schema: \"schema1\"\n medadata: [\n \"data.owner=some.person@some.domain\"\n \"file.version=1.1\"\n ]\n }\n {id: \"hdfs_avro_source\", kind: \"avro\", path: \"path/to/avro/file.avro\", schema: \"avro_schema\"}\n {id: \"hdfs_orc_source\", kind: \"orc\", path: \"path/to/orc/file.orc\"}\n ]\n hive: [\n {\n id: \"hive_source_1\", schema: \"some_schema\", table: \"some_table\",\n partitions: [{name: \"load_date\", values: [\"2023-06-30\", \"2023-07-01\"]}],\n keyFields: [\"id\", \"name\"]\n }\n ]\n table: [\n {id: \"table_source_1\", connection: \"oracle_db1\", table: \"some_table\", keyFields: [\"id\", \"name\"]}\n {id: \"table_source_2\", connection: \"sqlite_db\", table: \"other_table\"}\n ]\n kafka: [\n {\n id: \"kafka_source_1\",\n connection: \"kafka_broker\",\n topics: [\"topic1.pub\", \"topic2.pub\"]\n format: \"json\"\n }\n {\n id: \"kafka_source_2\",\n brokerId: \"kafka_broker\",\n topics: [\"topic3.pub@[1,3]\"]\n startingOffsets: \"\"\"{\"topic3.pub\":{\"1\":1234,\"3\":2314}}\"\"\"\n options: [\"kafkaConsumer.pollTimeoutMs=300000\"]\n format: \"json\"\n }\n ]\n greenplum: [\n {id: \"greenplum_source_1\", connection: \"greenplum_db\", table: \"some_table\"}\n ]\n }\n
"},{"location":"03-job-configuration/04-Streams/","title":"Streaming Sources Configurations","text":"When running Data Quality checks over the streaming data sources it is required to define them in streams
section of job configuration. Thus, sources defined in this section are read as streaming dataframes using Spark Structured Streaming API. More details on running data quality checks over streaming sources are given in Data Quality Checks over Streaming Sources chapter.
The configuration of streaming sources is the same as for the static ones. See chapter Sources Configuration for more details.
It is important to note that not all supported sources can be read in streaming mode. Currently, only sources below can be read as streams:
startingOffsets
. When defining streaming kafka source, the default value for this parameter is latest
. Also, for streaming kafka sources parameter endingOffsets
is ignored (all new records will be processed until application is stopped).The only additional parameter that is required to be defined for all streaming sources is following:
windowBy
- Optional, default is processingTime
. Source of timestamp used to assign records to a particular streaming windows and also to skip \"late\" records. Applicable only for streaming jobs! There are following options supported:processingTime
- Uses current timestamp at the moment when Spark processes record.eventTime
- Mostly applicable to kafka sources. Uses column with name timestamp
to retrieve time value from. This column must be of Timestamp type.custom(columnName)
- Uses arbitrary user-defined column to retrieve time value from. Specified column must be of Timestamp type. In addition, an SQL expression is are supported. An expression should also evaluate to value of Timestamp type. For example: custom(value.createdAt)
- the time value for a record will be retrieved from message value's field with name createdAt
.Checkita framework supports creation of virtual (temporary) sources base on regular once (defined in sources
section of job configuration, as described in Sources Configuration chapter). Virtual sources are created by applying transformations to existing sources using Spark SQL API. Subsequently, metrics and checks can also be applied to virtual sources.
It is also important to note, that virtual sources are created recursively, therefore, once virtual source is created it can be used to create another one in the same way as regular sources.
The following types of virtual sources are supported:
SQL
: enables creation of virtual source from existing once using arbitrary SQL query.Join
: creates virtual source by joining two (and only 2) existing sources.Filter
: creates virtual source from existing one by applying filter expression.Select
: creates virtual source from existing one by applying select expression.Aggregate
: creates virtual source by applying groupBy and aggregate operations to existing one.All types of virtual sources have common features:
Thus, virtual sources are defined in virtualSources
section of job configuration and have following common parameters:
id
- Required. Virtual source ID;description
- Optional. Virtual source description;parentSources
- Required. List of parent sources to use for creation of virtual sources. There could be a limitations imposed in number of parent sources, depending on virtual source type.persist
- Optional. One of the allowed Spark StorageLevels used to cache virtual sources. By default, virtual sources are not cached. Supported Spark StorageLevels are:NONE
, DISK_ONLY
, DISK_ONLY_2
, MEMORY_ONLY
, MEMORY_ONLY_2
, MEMORY_ONLY_SER
, MEMORY_ONLY_SER_2
, MEMORY_AND_DISK
, MEMORY_AND_DISK_2
, MEMORY_AND_DISK_SER
, MEMORY_AND_DISK_SER_2
, OFF_HEAP
.save
- Optional. File output configuration used to save virtual source. By default, virtual sources are not saved. For more information on configuring file outputs, see File Output Configuration chapter.keyFields
- Optional. List of columns that form a Primary Key or are used to identify row within a dataset. Key fields are primarily used in error collection reports. For more details on error collection, see Metric Error Collection chapter.metadata
- Optional. List of user-defined metadata parameters specific to this virtual source where each parameter is a string in format: param.name=param.value
.SQL
type of virtual sources is allowed only when allowSqlQueries
is set to true. Otherwise, any usage of arbitrary SQL queries will not be permitted. See Enablers chapter for more information. At the same time, there is no limitation on number of parent sources used to create SQL virtual source.
In order to define SQL virtual source, it is required to provide an SQL query:
kind: \"sql\"
- Required. Sets SQL
virtual source type.query
- Required. SQL query to build virtual source. Existing sources are referred in SQL query by their IDs.In order to define Join
type of virtual sources, it is required to provided two (and only two) parent sources that are being joined as well as type of the join to use and list of column to join by. Note, that in order to perform join, parent sources should have matching column names to join by. Join by condition is not currently supported:
kind: \"join\"
- Required. Sets Join
virtual source type.joinBy
- Required. List of columns to join by. Thus, parent sources must have the same columns names used for join.joinType
- Required. Type of Spark join to apply. Following join types are supported:inner
, outer
, cross
, full
, right
, left
, semi
, anti
, fullOuter
, rightOuter
, leftOuter
, leftSemi
, leftAnti
Filter
virtual source is defined by applying sequence of filter expressions to parent source. Thus, only one parent source must be supplied to this type of virtual source configuration:
kind: \"filter\"
- Required. Sets Filter
virtual source type.expr
- Required. Sequence of filter SQL expressions applied to parent source.Select
virtual source is defined by applying sequence of select expression to parent source. Each select expression should yield a new column. Thus, the number of columns in the virtual source correspond to number of provided select expressions. Subsequently, only one parent source must be supplied to this type of virtual source configuration:
kind: \"select\"
- Required. Sets Select
virtual source type.expr
- Required. Sequence of select SQL expressions applied to parent source.Aggregate
virtual source is defined by applying groupBy and aggregate operations to parent source. Thus, it is required to provide a list of columns used to group rows as well as list of aggregate operations in form of SQL expressions used to create columns with aggregated results. Thus, the number of columns in the virtual source correspond to number of provided aggregate expressions. Subsequently, only one parent source must be supplied to this type of virtual source configuration:
kind: \"aggregate\"
- Required. Sets Aggregate
virtual source type.groupBy
- Required. Sequence of columns used to group rows from parent source.expr
- Required. Sequence of SQL expressions used to get columns with aggregated results. As it is shown in the example below, virtualSources
section represent a list of virtual source definitions of various kinds.
jobConfig: {\n virtualSources: [\n {\n id: \"sqlVS\"\n kind: \"sql\"\n description: \"Filter data for specific date only\"\n parentSources: [\"hive_source_1\"]\n persist: \"disk_only\"\n save: {\n kind: \"orc\"\n path: \"some/path/to/vs/location\"\n }\n query: \"select id, name, entity, description from hive_source_1 where load_date == '2023-06-30'\"\n metadata: [\n \"source.owner=some.preson@some.domain\"\n \"critical.source=false\"\n ]\n }\n {\n id: \"joinVS\"\n kind: \"join\"\n parentSources: [\"hdfs_avro_source\", \"hdfs_orc_source\"]\n joinBy: [\"id\"]\n joinType: \"leftouter\"\n persist: \"memory_only\"\n keyFields: [\"id\", \"order_id\"]\n }\n {\n id: \"filterVS\"\n kind: \"filter\"\n parentSources: [\"kafka_source\"]\n expr: [\"key is not null\"]\n keyFields: [\"orderId\", \"dttm\"]\n }\n {\n id: \"selectVS\"\n kind: \"select\"\n parentSources: [\"table_source_1\"]\n expr: [\n \"count(id) as id_cnt\",\n \"sum(amount) as total_amount\"\n ]\n }\n {\n id: \"aggVS\"\n kind: \"aggregate\"\n parentSources: [\"hdfs_fixed_file\"]\n groupBy: [\"col1\"]\n expr: [\n \"avg(col2) as avg_col2\",\n \"sum(col3) as sum_col3\"\n ],\n keyFields: [\"col1\", \"avg_col2\", \"sum_col3\"]\n }\n ]\n}\n
"},{"location":"03-job-configuration/06-VirtualStreams/","title":"Virtual Streaming Sources Configuration","text":"When running Data Quality checks over the streaming data sources it is required to apply transformations to them thus creating virtual streaming sources. Such sources have to be defined in virutalStreams
section of the job configuration. Thus, transformations defined in this section are applied only to streaming sources using Spark Structured Streaming API. More details on running data quality checks over streaming sources are given in Data Quality Checks over Streaming Sources chapter.
The configuration of virtual streaming sources is the same as for the static ones. See chapter Virtual Sources Configuration for more details. In addition, column used as source of timestamp for windowing can be redefined and derived from the resultant virtual stream scheme. See Streaming Sources Configurations for more details on how to define column used as source of timestamp.
It is important to note that not all supported virtual sources types can be built from streaming sources. Currently, only filter and select types of virtual sources are supported in streaming applications.
"},{"location":"03-job-configuration/07-LoadChecks/","title":"Load Checks Configuration","text":"Load checks are the special type of checks that are distinguished from other checks as they are applied not to results of metrics computation but to sources metadata. Other key feature of load checks is that they are run prior actual data loading from the sources what is possible due Spark lazy evaluation mechanisms: sources are, essentially, Spark dataframes and load checks are used to verify their metadata.
Load checks are defined in loadChecks
section of job configuration and have following common parameters:
id
- Required. Load check ID;description
- Optional. Load check description;source
- Required. Reference to a source ID which metadata is being checked;metadata
- Optional. List of user-defined metadata parameters specific to this load check where each parameter is a string in format: param.name=param.value
.Currently, supported load checks are described below as well as configuration parameters specific to them.
"},{"location":"03-job-configuration/07-LoadChecks/#minimum-column-number-check","title":"Minimum Column Number Check","text":"This check is used to verify if number of columns in the source is equal to or greater than specified number. Load checks of this type are configured in the minColumnNum
subsection of the loadChecks
section. In addition to common parameters, following parameters should be specified:
option
- Required. Minimum number of columns that checked source must contain.This check is used to verify if number of columns in the source is exactly equal to specified number. Load checks of this type are configured in the exactColumnNum
subsection of the loadChecks
section. In addition to common parameters, following parameters should be specified:
option
- Required. Required number of columns that checked source must contain.This check is used to verify if source contains columns with required names. Load checks of this type are configured in the columnsExist
subsection of the loadChecks
section. In addition to common parameters, following parameters should be specified:
columns
- Required. List of column names that must exists in checked source.This check is used to verify if source schema matches predefined reference schema. Reference schema must be defined in schemas
section of configuration files as described in Schemas Configuration chapter. Load checks of this type are configured in the schemaMatch
subsection of the loadChecks
section. In addition to common parameters, following parameters should be specified:
schema
- Required. Reference Schema ID which should be used for comparison with source schema.ignoreOrder
- Optional, default is false
. Boolean parameter indicating whether columns order should be ignored during comparison of the schemas.As it is shown in the example below, load checks of the same type are grouped within subsections named after the type of the load check. These subsections should contain a list of load checks configurations of the corresponding type.
jobConfig: {\n loadChecks: {\n minColumnNum: [\n {id: \"load_check_1\", source: \"kafka_source\", option: 2}\n ]\n exactColumnNum: [\n {\n id: \"load_check_2\", \n description: \"Checking that source has exactly required number of columns\", \n source: \"hdfs_delimited_source\", option: 3\n metadata: [\n \"critical.loadcheck=true\"\n ]\n }\n ]\n columnsExist: [\n {id: \"loadCheck3\", source: \"sqlVS\", columns: [\"id\", \"name\", \"entity\", \"description\"]},\n {id: \"load_check_4\", source: \"hdfs_delimited_source\", columns: [\"id\", \"name\", \"value\"]}\n ]\n schemaMatch: [\n {id: \"load_check_5\", source: \"kafka_source\", schema: \"hive_schema\"}\n ]\n }\n}\n
"},{"location":"03-job-configuration/08-Metrics/","title":"Metrics Configuration","text":"Calculation of various metrics over the data is the main part of Data Quality job. Metrics allow evaluation of various indicators that describe data from both technical and business points of view. Indicators in their turn can signal about problems in the data.
All metrics are linked to a source over which they are calculated. Such metrics are called regular
. Apart from regular metrics there is a special kind of metrics that can be calculated based on other metrics results thus allowing metric compositions. These metrics are called composed
accordingly.
Metrics are defined in metrics
section of job configuration. Regular metrics are grouped by their type in regular
subsection while composed metrics are listed in composed
subsection.
All regular metrics are defined using following common parameters:
id
- Required. Metric ID;description
- Optional. Metric description.source
- Required. Reference to a source ID over which metric is caclulated;columns
- Required. List of columns over which metric is calculated. Regular metrics can be calculated for multiple columns. This means that the result of the metrics will be calculated for row values in these columns. There could be a limitation imposed on number of columns which metric can process. The only exception is Row Count Metric which does not need columns to be specified.params
- Some of the metrics may require additional parameters to be set. They should be specified within this object. The details on what parameters should be configured for metric are given below for each metric individually. Some metric definitions that require additional parameters are also have their default values set. In this case, params
object can be omitted to use default options for all parameters.metadata
- Optional. List of user-defined metadata parameters specific to this metric where each parameter is a string in format: param.name=param.value
.Additionally, some regular metrics have a logical condition that needs to be met when calculating metric increment per each individual row. If metric condition is not met, then Failure
status is returned for this particular row of data. Scenario when metric can yield Failure
status are explicitly described for each metric below. See Status Model used in Results chapter for more information on status model.
Calculates number of rows in the source. This is the only metric for which columns list should not be specified as it is not required to compute number of rows. Metric definition does not require additional parameters: params
should not be set.
All row count metrics are defined in rowCount
subsection.
Counts number of unique values in provided columns. When applied to multiple columns, total number of unique values in these columns is returned. Metric definition does not require additional parameters: params
should not be set.
All distinct values metrics are defined in distinctValues
subsection.
IMPORTANT. Calculation of exact number of unique values required O(N) memory. Therefore, to prevent OOM errors when working with extremely large dataset and with high-cardinality columns it is recommended to use Approximate Distinct Values Metric which uses HLL probabilistic algorithm to estimate number of unique values.
"},{"location":"03-job-configuration/08-Metrics/#approximate-distinct-values-metric","title":"Approximate Distinct Values Metric","text":"Calculates number of unique values approximately, using HyperLogLog algorithm.
This metric works with only one column.
All approximate distinct values metrics are defined in approximateDistinctValues
subsection. Additional parameters can be supplied:
accuracyError
- Optional, default is 0.01
. Accuracy error for estimating number of unique values.Counts number of null values in the specified columns. When applied to multiple columns, total number of null values in these columns is returned. Metric definition does not require additional parameters: params
should not be set.
All distinct values metrics are defined in nullValues
subsection.
Metric increment returns Failure
status for rows where some values in the specified columns are null.
Counts number of empty values in the specified columns (i.e. empty string values). When applied to multiple columns, total number of empty values in these columns is returned. Metric definition does not require additional parameters: params
should not be set.
All distinct values metrics are defined in emptyValues
subsection.
Metric increment returns Failure
status for rows where some values in the specified columns are empty.
Calculates the measure of completeness in the specified columns: (values_count - null_count) / values_count
. When applied to multiple columns, total number of values and total number of nulls are used in the equation above.
All completeness metrics are defined in completeness
subsection. Additional parameters can be supplied:
includeEmptyStrings
- Optional, default is false
. Boolean parameter indicating whether empty string values should be considered as nulls.Calculates measure of completeness of an incremental sequence of integers. In other words, it looks for the missing elements in the sequence and returns the relation: actual number of elements / required number of elements
.
This metric works with only one column.
The actual number of elements is just the number of unique values in the sequence. This metric defines it exactly, and therefore requires O(N)
memory to store these values. Therefore, to prevent OOM errors for extremely large sequences, it is recommended to use the Approximate Sequence Completeness Metric, which uses HLL probabilistic algorithm to estimate number of unique values.
The required number of elements is determined by the formula: (max_value - min_value) / increment + 1
, Where: * min_value
- the minimum value in the sequence; * max_value
- the maximum value in the sequence; * increment
- sequence step, default is 1.
All sequence completeness metrics are defined in sequenceCompleteness
subsection. Additional parameters can be supplied:
incremet
- Optional, default is 1
. Sequence increment step.Calculates the measure of completeness of an incremental sequence of integers approximately using the HyperLogLog algorithm. Works in the same way is Sequence Completeness Metric with only difference, that actual number of elements in the sequence is determined approximately using HLL algorithm.
This metric works with only one column.
All approximate sequence completeness metrics are defined in approximateSequenceCompleteness
subsection. Additional parameters can be supplied:
incremet
- Optional, default is 1
. Sequence increment step.accuracyError
- Optional, default is 0.01
. Accuracy error for estimating number of unique values.Calculates the minimum string length in the values of the specified columns. Metric definition does not require additional parameters: params
should not be set.
All minimum string metrics are defined in minString
subsection.
Metric increment returns Failure
status for rows where all values in the specified columns are not castable to string and, therefore, minimum string length cannot be computed.
Calculates the maximum string length in the values of the specified columns. Metric definition does not require additional parameters: params
should not be set.
All maximum string metrics are defined in maxString
subsection.
Metric increment returns Failure
status for rows where all values in the specified columns are not castable to string and, therefore, maximum string length cannot be computed.
Calculates the average string length in the values of the specified columns. Metric definition does not require additional parameters: params
should not be set.
All average string metrics are defined in avgString
subsection.
Metric increment returns Failure
status for rows where all values in the specified columns are not castable to string and, therefore, average string length cannot be computed.
Calculate number of values that meet the defined string length criteria.
All string length metrics are defined in stringLength
subsection. Additional parameters should be supplied:
length
- Required. Required string length threshold.compareRule
- Required. Comparison rule used to compare actual value string length with threshold one.eq
(==), lt
(<), lte
(<=), gt
(>), gte
(>=).Metric increment returns Failure
status for rows where some values in the specified columns do not meet defined string length criteria.
Counts number of values which fall into specified set of allowed values.
All string in domain metrics are defined in stringInDomain
subsection. Additional parameters should be supplied:
domain
- Required. List of allowed values.Metric increment returns Failure
status for rows where some values in the specified columns do not fall into set of allowed values.
Counts number of values which do not fall into specified set of avoided values.
All string out domain metrics are defined in stringOutDomain
subsection. Additional parameters should be supplied:
domain
- Required. List of avoided values.Metric increment returns Failure
status for rows where some values in the specified columns do fall into set of avoided values.
Counts number of values that are equal to the value given in metric definition.
All string values metrics are defined in stringValues
subsection. Additional parameters should be supplied:
compareValue
- Required. String value to compare with.Metric increment returns Failure
status for rows where some values in the specified columns do not match defined compare value.
Calculates number of values that match the defined regular expression.
All regex match metrics are defined in regexMatch
subsection. Additional parameters should be supplied:
regex
- Required. Regular expression to match.Metric increment returns Failure
status for rows where some values in the specified columns do not match defined regular expression.
Calculates number of values that do not match the defined regular expression.
All regex mismatch metrics are defined in regexMismatch
subsection. Additional parameters should be supplied:
regex
- Required. Regular expression that values should not match.Metric increment returns Failure
status for rows where some values in the specified columns do match defined regular expression.
Counts number of values which have the specified datetime format.
All formatted date metrics are defined in formattedDate
subsection. Additional parameters can be supplied:
dateFormat
- Optional, default is yyyy-MM-dd'T'HH:mm:ss.SSSZ
. Target datetime format. The datetime format must be specified as Java DateTimeFormatter pattern.NOTE If the specified columns are of type Timestamp
, it is assumed that they fit any datetime format and, therefore, metric will return the total number of non-empty cells. Accordingly, the datetime format does not need to be specified.
Metric increment returns Failure
status for rows where some values in the specified columns do not conform to defined datetime format.
Counts number of values which are numeric and number format satisfy defined number format criteria.
All formatted date metrics are defined in formattedNumber
subsection. Additional parameters should be supplied:
precision
- Required. The total number of digits in the value (excluding the decimal separator).scale
- Required. Number of decimal digits in the value.compareRule
- Optional, default is inbound
. Number format comparison rule:inbound
- the value must \"fit\" into the specified number format: actual precision and scale of the value are less than or equal to given ones.outbound
- the value must be outside the specified format: actual precision and scale of the value are strictly greater than given ones.Metric increment returns Failure
status for rows where some values in the specified columns do not satisfy defined number format criteria.
Finds minimum number from the values in the specified columns. Metric definition does not require additional parameters: params
should not be set.
All minimum number metrics are defined in minNumber
subsection.
Metric increment returns Failure
status for rows where all values in the specified columns are not castable to number and, therefore, minimum number cannot be computed.
Finds maximum number from the values in the specified columns. Metric definition does not require additional parameters: params
should not be set.
All maximum number metrics are defined in maxNumber
subsection.
Metric increment returns Failure
status for rows where all values in the specified columns are not castable to number and, therefore, maximum number cannot be computed.
Finds sum of the values in the specified columns. Metric definition does not require additional parameters: params
should not be set.
All sum number metrics are defined in sumNumber
subsection.
Metric increment returns Failure
status for rows where some values in the specified columns are not castable to number.
Finds average of the values in the specified column. Metric definition does not require additional parameters: params
should not be set.
This metric works with only one column.
All average number metrics are defined in avgNumber
subsection.
Metric increment returns Failure
status for rows where value in the specified column is not castable to number.
Finds standard deviation for the values in the specified column. Metric definition does not require additional parameters: params
should not be set.
This metric works with only one column.
All average number metrics are defined in stdNumber
subsection.
Metric increment returns Failure
status for rows where value in the specified column is not castable to number.
Counts number of values which string value can be converted to a number (double). Metric definition does not require additional parameters: params
should not be set.
All sum number metrics are defined in castedNumber
subsection.
Metric increment returns Failure
status for rows where some values in the specified columns are not castable to number.
Counts number of values which being cast to number (double) fall into specified set of allowed numbers.
All number in domain metrics are defined in numberInDomain
subsection. Additional parameters should be supplied:
domain
- Required. List of allowed numbers.Metric increment returns Failure
status for rows where some values in the specified columns do not fall into set of allowed numbers.
Counts number of values which being cast to number (double) do not fall into specified set of avoided numbers.
All number out domain metrics are defined in numberOutDomain
subsection. Additional parameters should be supplied:
domain
- Required. List of avoided numbers.Metric increment returns Failure
status for rows where some values in the specified columns do fall into set of avoided numbers.
Counts number of values which being cast to number (double) are less than (or equal to) the specified value.
All number less than metrics are defined in numberLessThan
subsection. Additional parameters should be supplied:
compareValue
- Required. Number to compare with.includeBound
- Optional, default is false
. Specifies whether to include compareValue
in the range for comparison.Metric increment returns Failure
status for rows where some values in the specified columns do not satisfy the comparison criteria.
Counts number of values which being cast to number (double) are greater than (or equal to) the specified value.
All number greater than metrics are defined in numberGreaterThan
subsection. Additional parameters should be supplied:
compareValue
- Required. Number to compare with.includeBound
- Optional, default is false
. Specifies whether to include compareValue
in the range for comparison.Metric increment returns Failure
status for rows where some values in the specified columns do not satisfy the comparison criteria.
Counts number of values which being cast to number (double) are within the given interval.
All number between metrics are defined in numberBetween
subsection. Additional parameters should be supplied:
lowerCompareValue
- Required. The lower bound of the interval.upperCompareValue
- Required. The upper bound of the interval.includeBound
- Optional, default is false
. Specifies whether to include interval bounds in the range for comparison.Metric increment returns Failure
status for rows where some values in the specified columns do not satisfy the comparison criteria.
Counts number of values which being cast to number (double) are outside the given interval.
All number between metrics are defined in numberNotBetween
subsection. Additional parameters should be supplied:
lowerCompareValue
- Required. The lower bound of the interval.upperCompareValue
- Required. The upper bound of the interval.includeBound
- Optional, default is false
. Specifies whether to include interval bounds in the range for comparison.Metric increment returns Failure
status for rows where some values in the specified columns do not satisfy the comparison criteria.
Counts number of values which being cast to number (double) are equal to the number given in metric definition.
All number values metrics are defined in numberValues
subsection. Additional parameters should be supplied:
compareValue
- Required. Number value to compare with.Metric increment returns Failure
status for rows where some values in the specified columns do not match defined compare value.
Calculates median value of the values in the specified column. Metric calculator uses TDigest library for computation of median value.
This metric works with only one column.
All median value metrics are defined in medianValue
subsection. Additional parameters can be supplied:
accuracyError
- Optional, default is 0.01
. Accuracy error for calculation of median value.Metric increment returns Failure
status for rows where value in the specified column is not castable to number.
Calculates first quantile for the values in the specified column. Metric calculator uses TDigest library for computation of first quantile.
This metric works with only one column.
All median value metrics are defined in firstQuantile
subsection. Additional parameters can be supplied:
accuracyError
- Optional, default is 0.01
. Accuracy error for calculation of first quantile value.Metric increment returns Failure
status for rows where value in the specified column is not castable to number.
Calculates third quantile for the values in the specified column. Metric calculator uses TDigest library for computation of third quantile.
This metric works with only one column.
All third value metrics are defined in thirdQuantile
subsection. Additional parameters can be supplied:
accuracyError
- Optional, default is 0.01
. Accuracy error for calculation of third value.Metric increment returns Failure
status for rows where value in the specified column is not castable to number.
Calculates an arbitrary quantile for the values in the specified column. Metric calculator uses TDigest library for computation of quantile.
This metric works with only one column.
All get quantile metrics are defined in getQuantile
subsection. Additional parameters should be supplied:
accuracyError
- Optional, default is 0.01
. Accuracy error for calculation of quantile value.target
- Required. A number in the interval [0, 1]
corresponding to the quantile that need to be caclulated.Metric increment returns Failure
status for rows where value in the specified column is not castable to number.
This metric is inverse of Get Quantile Metric. It calculates a percentile value (quantile in %) which corresponds to the specified number from the set of values in the column. Metric calculator uses TDigest library for computation of percentile value.
This metric works with only one column.
All get percentile metrics are defined in getPercentile
subsection. Additional parameters should be supplied:
accuracyError
- Optional, default is 0.01
. Accuracy error for calculation of percentile.target
- Required. The number from the set of values in the column, for which percentile is determined.Metric increment returns Failure
status for rows where value in the specified column is not castable to number.
Calculates the number of rows where values in the specified columns are equal to each other. Metric definition does not require additional parameters: params
should not be set.
This metric works with at least two columns.
All column equality metrics are defined in columnEq
subsection.
Metric increment returns Failure
status for rows where some values in the specified column are not castable to string or when they are not equal.
Calculates the number of rows where difference between date in two columns expressed in terms of days is less (strictly less) than the specified threshold value.
This metric works with exactly two columns.
All day distance metrics are defined in dayDistance
subsection. Additional parameters should be supplied:
threshold
- Required. Maximum allowed difference between two dates in days (not included in the range for comparison).dateFormat
- Optional, default is yyyy-MM-dd'T'HH:mm:ss.SSSZ
. Target datetime format. The datetime format must be specified as Java DateTimeFormatter pattern.NOTE If the specified columns are of type Timestamp
, it is assumed that they fit any datetime format and, therefore, metric will return the total number of non-empty cells. Accordingly, the datetime format does not need to be specified.
Metric increment returns Failure
status for rows where some values in the specified columns do not conform to the specified datetime format or when date difference in days is greater than or equal to specified threshold.
Calculates number of rows where Levenshtein distance between string values in the provided columns is less than (strictly less) specified threshold.
This metric works with exactly two columns.
All levenshtein distance metrics are defined in levenshteinDistance
subsection. Additional parameters should be supplied:
threshold
- Required. Maximum allowed Levenshtein distance.normalize
- Optional, default is false
. Boolean parameter indicating whether the Levenshtein distance should be normalized with respect to the maximum of the two string lengths.IMPORTANT. If Levenshtein distance is normalized then threshold value must be in range [0, 1]
.
Metric increment returns Failure
status for rows where some values in the specified columns are not castable to string or when Levenshtein distance is greater than or equal to specified threshold.
Calculates the covariance moment of the values in two columns (co-moment). Metric definition does not require additional parameters: params
should not be set.
This metric works with exactly two columns.
IMPORTANT. For the metric to be calculated, values in the specified columns must not be empty or null and also can be cast to number (double). If at least one corrupt value is found, then metric calculator returns NaN value.
Metric increment returns Failure
status for rows where some values in the specified columns cannot be cast to number.
Calculates the covariance of the values in two columns. Metric definition does not require additional parameters: params
should not be set.
This metric works with exactly two columns.
IMPORTANT. For the metric to be calculated, values in the specified columns must not be empty or null and also can be cast to number (double). If at least one corrupt value is found, then metric calculator returns NaN value.
Metric increment returns Failure
status for rows where some values in the specified columns cannot be cast to number.
Calculates the covariance of the values in two columns with the Bessel correction. Metric definition does not require additional parameters: params
should not be set.
This metric works with exactly two columns.
IMPORTANT. For the metric to be calculated, values in the specified columns must not be empty or null and also can be cast to number (double). If at least one corrupt value is found, then metric calculator returns NaN value.
Metric increment returns Failure
status for rows where some values in the specified columns cannot be cast to number.
This is a specific metric that calculates approximate N most frequently occurring values in a column. The metric calculator uses Twitter Algebird library, which implements abstract algebra methods for Scala.
This metric works with only one column.
All top N metrics are defined in topN
subsection. Additional parameters can be supplied:
targetNumber
- Optional, default is 10
. Number N of values to search.maxCapacity
- Optional, default is 100
. Maximum container size for storing top values.Composed metrics are defined using a formula (specified in the formula
field) for their calculation. As composed metric are intended for using other metric results to compute a derivative result then, these metrics can be referenced in the formula by their IDs.
Formula must be written using Mustache Template notation, e.g.: {{ metric_1 }} + {{ metic_2 }}
.
Basic (+-*/) and exponentiation (^) math operations are supported, as well as grouping using parentheses.
This, composed metrics are defined in the composed
subsection using following parameters:
id
- Required. Composed metric ID;description
- Optional. Composed metric description.formula
- Required. Formula to calculate composed metricAs it is shown in the example below, regular metrics of the same type are grouped within subsections named after the type of the metric. These subsections should contain a list of metrics configurations of the corresponding type. Composed metrics are listed in the separate subsection.
jobConfig: {\n metrics: {\n regular: {\n rowCount: [\n {id: \"hive_table_row_cnt\", description: \"Row count in hive_source_1\", source: \"hive_source_1\"},\n {id: \"csv_file_row_cnt\", description: \"Row count in hdfs_delimited_source\", source: \"hdfs_delimited_source\"}\n ]\n distinctValues: [\n {\n id: \"fixed_file_dist_name\", description: \"Distinct values in hdfs_fixed_file\",\n source: \"hdfs_fixed_file\", columns: [\"colA\"],\n metadata: [\n \"requestor=some.person@some.domain\"\n \"critical.metric=true\"\n ]\n }\n ]\n nullValues: [\n {id: \"hive_table_nulls\", description: \"Null values in columns id and name\", source: \"hive_source_1\", columns: [\"id\", \"name\"]}\n ]\n completeness: [\n {id: \"orc_data_compl\", description: \"Completness of column id\", source: \"hdfs_orc_source\", columns: [\"id\"]}\n {\n id: \"hive_table_nulls\", \n description: \"Completness of columns id and name\", \n source: \"hive_source_1\", \n columns: [\"id\", \"name\"]\n }\n ]\n avgNumber: [\n {id: \"avro_file1_avg_bal\", description: \"Avg number of column balance\", source: \"hdfs_avro_source\", columns: [\"balance\"]}\n ]\n regexMatch: [\n {\n id: \"table_source1_inn_regex\", description: \"Regex match for inn column\", source: \"table_source_1\",\n columns: [\"inn\"], params: {regex: \"\"\"^\\d{10}$\"\"\"}\n }\n ]\n stringInDomain: [\n {\n id: \"orc_data_segment_domain\", source: \"hdfs_orc_source\",\n columns: [\"segment\"], params: {domain: [\"FI\", \"MID\", \"SME\", \"INTL\", \"CIB\"]}\n }\n ]\n topN: [\n {\n id: \"filterVS_top3_currency\", description: \"Top 3 currency in filterVS\", source: \"filterVS\",\n columns: [\"id\"], params: {targetNumber: 3, maxCapacity: 10}\n }\n ],\n levenshteinDistance: [\n {\n id: \"lvnstDist\", source: \"table_source_2\", columns: [\"col1\", \"col2\"],\n params: {normalize: true, threshold: 0.3}\n }\n ]\n }\n composed: [\n {\n id: \"pct_of_null\", description: \"Percent of null values in hive_table1\",\n formula: \"100 * {{ hive_table_nulls }} ^ 2 / ( {{ hive_table_row_cnt }} + 1)\"\n }\n ]\n }\n}\n
"},{"location":"03-job-configuration/09-Checks/","title":"Checks Configurations","text":"Performing checks ove the metric results is an important step in Checkita framework. As metric results are calculated then checks can be configured to identify if there are any problems with quality of data.
In Checkita there are two main group of checks:
Spanshot
checks - allows comparison of metric results with static thresholds or with other metric results in the same Data Quality job.Trend
checks - allows evaluation of how metric result is changing over a certain period of time. Checks of this type are used to detect anomalies in data. In order trend check work it is required to set up Data Quality storage since check calculator need to fetch historical results for the metric of interest.After evaluation, check will have a status as described in Status Model used in Results chapter.
"},{"location":"03-job-configuration/09-Checks/#snapshot-checks","title":"Snapshot Checks","text":"Snapshot checks represent a simple comparison of metric results with a static threshold or with other metric result.
The following snapshot checks are supported:
equalTo
- checks if metric results is equal to a given threshold value or to other metric result.lessThan
- checks if metric result is less than a given threshold value or other metric result.greaterThan
- checks if metric result is greater than a given threshold value or other metric result.differByLT
- checks if relative difference between two metric results is less than a given threshold. This check succeeds when following expression is true: | metric - compareMetric | / compareMetric < threshold
.Snapshot checks are configured using common set of parameters, which are:
id
- Required. Check IDdescription
- Optional. Description of the check.metric
- Required. Metric ID which results is checked.compareMetric
- Optional. Metric ID which result is used as a threshold.threshold
- Optional. Explicit threshold value.metadata
- Optional. List of user-defined metadata parameters specific to this check where each parameter is a string in format: param.name=param.value
.IMPORTANT. When configuring check it should be specified either an explicit threshold value in threshold
field or other metric ID in compareMetric
field which result will be used as a threshold value. The only exception to this rule is differByLY
check for which it is required to specify both, threshold value and metric ID to compare with.
Trend checks are used to detect anomalies in data. This type of checks allows to verify that the value of the metric corresponds to its average value within a given deviation for a certain period of time. Maximum allowed deviation is configured by providing a threshold value.
Following trend checks are supported:
averageBoundFull
- sets the same upper and lower deviation from metric average result. Check succeeds when following expression is true: (1 - threshold) * avgResult <= currentResult <= (1 + threshold) * avgResult
.averageBoundUpper
- verifies only upper deviation from the metric average result. Check succeeds when following expression is true: currentResult <= (1 + threshold) * avgResult
.averageBoundLower
- verifies only lower deviation from the metric average result. Check succeeds when following expression is ture: (1 - threshold) * avgResult <= currentResult
.averageBoundRange
- sets different thresholds for upper and lower deviations from metric average results. Check succeeds when following expression is true: (1 - thresholdLower) * avgResult <= currentResult <= (1 + thresholdUpper) * avgResult
.Trend checks are configured using following set of parameters:
id
- Required. Check IDdescription
- Optional. Description of the check.metric
- Required. Metric ID which result is checked.rule
- Required. The rule for calculating historical average value of the metric. There are two rules supported:record
- calculates the average value of metric for the configured number of historical records.datetime
- calculates the average value of metric for the configured datetime window.windowSize
- Required. Size of the window for average metric value calculation:rule
is set to record
then window size is the number of records to retrieve.rule
is set to datetime
then window size is a duration string which should conform to Scala Duration.windowOffset
- Optional, default is 0
or 0s
. Set window offset back from current reference date (see Working with Date and Time chapter for more details on reference date). By default, offset is absent and window start from current reference date.rule
is set to record
then window offset is the number of records to skip from reference date.rule
is set to datetime
then window offset is a duration string which should conform to Scala Duration.threshold
- Required. Sets maximum allowed deviation from historical average metric result. Not used with averageBoundRange
check.thresholdLower
- Required. Sets maximum allowed lower deviation from historical average metric result. *Used only for averageBoundRange
check.thresholdUpper
- Required. Sets maximum allowed upper deviation from historical average metric result. *Used only for averageBoundRange
check.metadata
- Optional. List of user-defined metadata parameters specific to this metric where each parameter is a string in format: param.name=param.value
.NOTE. Scala Duration string has a format of <length><unit>
where following units are allowed: d
, day
, h
, hr
, hour
, m
, min
, minute
, s
, sec
, second
, ms
, milli
, millisecond
, \u00b5s
, micro
, microsecond
, ns
, nano
, nanosecond
.
This is a special check designed specifically for Top N Metric and working only with it. Top N rank check calculates the Jacquard distance between the current and previous sets of top N metric and checks if it does not exceed the threshold value.
IMPORTANT: Calculation of this check is currently supported only between the current and previous topN metric sets.
Top N rank check is configured using following parameters:
id
- Required. Check IDdescription
- Optional. Description of the check.metric
- Required. Metric ID which result is checked.targetNumber
- Required. Number of records from the set of top N metric results that is considered. This number should be less than or equal to number of collected top values in top N metric.threshold
- Required. Maximum allowed Jacquard distance between current and previous sets of records from top N metric result. Should be a number in interval [0, 1]
.metadata
- Optional. List of user-defined metadata parameters specific to this metric where each parameter is a string in format: param.name=param.value
.As it is shown in the example below, checks are grouped into two subsections: trend
and snapshot
. Then, checks of the same type are grouped within subsections named after the type of the checks. These subsections should contain a list of metrics configurations of the corresponding type.
jobConfig: {\n checks: {\n trend: {\n averageBoundFull: [\n {\n id: \"avg_bal_check\",\n description: \"Check that average balance stays within +/-25% of the week average\"\n metric: \"avro_file1_avg_bal\",\n rule: \"datetime\"\n windowSize: \"8d\"\n threshold: 0.25\n metadata: [\n \"requestor=some.person@some.domain\",\n \"critical.check=true\"\n ]\n }\n ]\n averageBoundUpper: [\n {id: \"avg_pct_null\", metric: \"pct_of_null\", rule: \"datetime\", windowSize: \"15d\", threshold: 0.5}\n ]\n averageBoundLower: [\n {id: \"avg_distinct\", metric: \"fixed_file_dist_name\", rule: \"record\", windowSize: 31, threshold: 0.3}\n ]\n averageBoundRange: [\n {\n id: \"avg_inn_match\",\n metric: \"table_source1_inn_regex\",\n rule: \"datetime\",\n windowSize: \"8d\",\n thresholdLower: 0.2\n thresholdUpper: 0.4\n }\n ]\n topNRank: [\n {id: \"top2_curr_match\", metric: \"filterVS_top3_currency\", targetNumber: 2, threshold: 0.1}\n ]\n }\n snapshot: {\n differByLT: [\n {\n id: \"row_cnt_diff\",\n description: \"Number of rows in two tables should not differ on more than 5%.\",\n metric: \"hive_table_row_cnt\"\n compareMetric: \"csv_file_row_cnt\"\n threshold: 0.05\n }\n ]\n equalTo: [\n {id: \"zero_nulls\", description: \"Hive Table1 mustn't contain nulls\", metric: \"hive_table_nulls\", threshold: 0}\n ]\n greaterThan: [\n {id: \"completeness_check\", metric: \"orc_data_compl\", threshold: 0.99}\n ]\n lessThan: [\n {id: \"null_threshold\", metric: \"pct_of_null\", threshold: 0.01}\n ]\n }\n }\n}\n
"},{"location":"03-job-configuration/10-Targets/","title":"Targets Configuration","text":"Targets are designed to provide alternative channels for sending results. First of all, targets can be used to send notifications to users about problems in their data or just send summary of Data Quality job. In addition, targets provide different ways for saving results, e.g. write them to file in HDFS or send to Kafka topic.
All targets are configured in targets
section of the job configuration. There are four general types of targets that can be configured depending on what information is being sent or saved:
Results targets are configured in the results
subsection and can be one of the following type depending on where they are sent or saved:
file
- Save results as file in local or remote (HDFS, S3, etc.) file system.hive
- Save results in HDFS as Hive table. Note that Hive table with required schema must be created prior results saving.kafka
- Send results to Kafka topic in JSON format.For result target of any type it is required to configure list of result to be saved or sent:
resultTypes
- Required. List of result types to save or sent. May include following: regularMetrics
, composedMetrics
, loadChecks
, checks
, jobState
. Note that all results types are reduced to Unified Targets Schema and saved together.In order to save results to file, it is required to configure result target of file
type. In addition to list of saved results, it is required to configure file output.
save
- Required. File output configuration used to save results. For more information on configuring file outputs, see File Output Configuration chapter.File with results will have Unified Targets Schema.
"},{"location":"03-job-configuration/10-Targets/#save-results-to-hive","title":"Save Results to Hive","text":"In order to save results to Hive table, it is required to configure result target of hive
type. Hive table to which results will be saved must be created in advance with Unified Targets Schema.
Thus, in addition to list of saved results, it is required to indicate Hive schema and table:
schema
- Required. Hive schema.table
- Required. Hive table.Note that results will be appended to Hive table.
"},{"location":"03-job-configuration/10-Targets/#send-results-to-kafka","title":"Send Results to Kafka","text":"In order to send results to Kafka topic, it is required to configure result target of kafka
type. Connection to Kafka cluster must be configured in connections
section of job configuration as described in Kafka Connection Configuration.
Thus, in addition to list of saved results, it is required provide following parameters:
connection
- Required. Kafka connection ID.topic
- Required. Kafka topic to send results to.options
- Optional. Additional list of Kafka parameters for sending messages to topic. Parameters are provided as a strings in format of parameterName=parameterValue
.Results will be saved as JSON messages. In addition, aggregatedKafkaOutput
parameter configured in application settings controls how results will be sent (see Enablers chapter):
Error collection targets are configured in errorCollection
subsection and can be one of the following type depending on where metric errors are sent or saved:
file
- Save metric errors as file in local or remote (HDFS, S3, etc.) file system.hive
- Save metric errors in HDFS as Hive table. Note that Hive table with required schema must be created prior metric errors saving.kafka
- Send metric errors to Kafka topic in JSON format.Note that metric errors are transformed to Unified Targets Schema when send or saved.
For error collection target of any type the following parameters can be supplied:
metrics
- Optional. List of metric for which errors will be saved. If omitted, then errors are saved for all metrics defined in Data Quality job.dumpSize
- Optional, default is 100
. Allows additionally limit number of errors saved per metric in order to make reports more compact. Could not be larger, than application-level limitation as described in Enablers chapter.In order to save metric errors to file, it is required to configure error collection target of file
type. In addition to common error collection target parameters, it is required to configure file output:
save
- Required. File output configuration used to save results. For more information on configuring file outputs, see File Output Configuration chapter.File with metric errors will have Unified Targets Schema.
"},{"location":"03-job-configuration/10-Targets/#save-metric-errors-to-hive","title":"Save Metric Errors to Hive","text":"In order to save metric errors to Hive table, it is required to configure result error collection target of hive
type. Hive table to which metric errors will be saved must be created in advance with Unified Targets Schema.
Thus, in addition to common error collection target parameters, it is required to indicate Hive schema and table:
schema
- Required. Hive schema.table
- Required. Hive table.Note that metric errors will be appended to Hive table.
"},{"location":"03-job-configuration/10-Targets/#send-metric-errors-to-kafka","title":"Send Metric Errors to Kafka","text":"In order to send metric errors to Kafka topic, it is required to configure error collection target of kafka
type. Connection to Kafka cluster must be configured in connections
section of job configuration as described in Kafka Connection Configuration.
Thus, in addition to common error collection target parameters, it is required provide following ones:
connection
- Required. Kafka connection ID.topic
- Required. Kafka topic to send results to.options
- Optional. Additional list of Kafka parameters for sending messages to topic. Parameters are provided as a strings in format of parameterName=parameterValue
.Metric errors will be saved as JSON messages. In addition, aggregatedKafkaOutput
parameter configured in application settings controls how metric errors will be sent (see Enablers chapter):
IMPORTANT. Be careful, when using this option for saving metric errors as there could be a significant number of them. In order to fit into Kafka message size limits it is recommended to limit number of errors sent per each metric by setting dumpSize
parameter to a reasonably low number.
Checkita framework collects summary upon completion of each Data Quality job. Summary targets are designed accordingly, to enable sending summary reports to users. Thus, summary targets are configured in summary
subsection and can be one of the following type depending on where summary reports are sent or saved:
email
- Send summary report to user(s) via email.mattermost
- Send summary report to mattermost either to channel or to user's direct messages.kafka
- Send summary report to Kafka topic in JSON format. When sending summary report to Kafka, it is transformed to Unified Targets Schema.For summary target of email
or mattermost
type the following parameters can be supplied:
attachMetricErrors
- Optional, default is false
. Boolean parameter indicating whether report with collected metric errors should be attached to email or message with summary report.attachFailedChecks
- Optional, default is false
. Boolean parameter indicating whether report with failed checks should be attached to email or message with summary report.metrics
- Optional. If attachMetricErrors
is set to true
, then this parameter can be used to specify list of metric for which errors will be saved. If omitted, then errors are saved for all metrics defined in Data Quality job.dumpSize
- Optional, default is 100
. If attachMetricErrors
is set to true
, then this parameter allows additionally limit number of errors saved per metric in order to make report more compact. Could not be larger, than application-level limitation as described in Enablers chapter.In order to send summary report via email, it is required to configure summary target of email
type. In addition to common summary target parameters, it is required to configure following ones:
recipients
- Required. List of recipients' emails to which summary report will be sent.template
- Optional. HTML template to build email body.templateFile
- Optional. Location of the file with HTML template to build email body.HTML template is optional. If HTML template is not provided then the default summary report body is compiled. Moreover, it should be noted, that template
parameter has higher priority than templateFile
one. Therefore, if both of them are set then explicitly defined then HTML template from template
parameter is used.
In addition, HTML templates support parameter substitution using Mustache Template notation, e.g.: This {{ parameterName }} has a value of {{ parameterValue }}
. List of available parameters that can be used for substitution in HTML templates is given in Job Summary Parameters Available for Templates chapter below.
In order to send summary report to mattermost, it is required to configure summary target of mattermost
type. In addition to common summary target parameters, it is required to configure following ones:
recipients
- Required. List of recipients' to which summary report will be sent. Message can be sent either to a channel or to a user's direct messages:#
sign: #someChannel
.@
prefix: @someUser
.template
- Optional. Markdown template to build message body.templateFile
- Optional. Location of the file with Markdown template to build message body.Markdown template is optional. If Markdown template is not provided then the default summary report body is compiled. Moreover, it should be noted, that template
parameter has higher priority than templateFile
one. Therefore, if both of them are set then explicitly defined then Markdown template from template
parameter is used.
In addition, Markdown templates support parameter substitution using Mustache Template notation, e.g.: This {{ parameterName }} has a value of {{ parameterValue }}
. List of available parameters that can be used for substitution in Markdown templates is given in Job Summary Parameters Available for Templates chapter below.
In order to send summary report to Kafka topic, it is required to configure summary target of kafka
type. Connection to Kafka cluster must be configured in connections
section of job configuration as described in Kafka Connection Configuration.
Kafka messages do not support any from of attachments, therefore, only summary report itself can be sent to Kafka topic. Summary report is sent in form of JSON string that will contain all the parameters defined in Job Summary Parameters Available for Templates chapter below. JSON string format will conform to Unified Targets Schema.
Thus, in order to configure kafka
summary target it is required to specify following parameters:
connection
- Required. Kafka connection ID.topic
- Required. Kafka topic to send results to.options
- Optional. Additional list of Kafka parameters for sending messages to topic. Parameters are provided as a strings in format of parameterName=parameterValue
.Check alert targets are developed specifically to enable notification sending in case if some of watched checks have failed. These targets are configured in checkAlert
subsection and can be one of the following type depending on where alerts are sent:
email
- Send check alert to user(s) via email.mattermost
- Send check alert to mattermost either to channel or to user's direct messages.For check alert target of any type the following parameters can be supplied:
id
- Required. ID of check alert. There could be different check alert configurations for different sets of checks. Therefore, check alerts should have an ID, in order to distinguish them.checks
- Optional. List of watched checks. If any of watched checks fails then alert notification is sent. If omitted, then all checks defined in the Data Quality job are being watched.In order to send check alert via email, it is required to configure check alert target of email
type. In addition to common check alert target parameters, it is required to configure following ones:
recipients
- Required. List of recipients' emails to which check alert will be sent.template
- Optional. HTML template to build email body.templateFile
- Optional. Location of the file with HTML template to build email body.HTML template is optional. If HTML template is not provided then the default check alert body is compiled. Moreover, it should be noted, that template
parameter has higher priority than templateFile
one. Therefore, if both of them are set then explicitly defined then HTML template from template
parameter is used.
In addition, HTML templates support parameter substitution using Mustache Template notation, e.g.: This {{ parameterName }} has a value of {{ parameterValue }}
. List of available parameters that can be used for substitution in HTML templates is given in Job Summary Parameters Available for Templates chapter below.
In order to check alert to mattermost, it is required to configure check alert target of mattermost
type. In addition to common check alert target parameters, it is required to configure following ones:
recipients
- Required. List of recipients' to which check alert will be sent. Message can be sent either to a channel or to a user's direct messages:#
sign: #someChannel
.@
prefix: @someUser
.template
- Optional. Markdown template to build message body.templateFile
- Optional. Location of the file with Markdown template to build message body.Markdown template is optional. If Markdown template is not provided then the default check alert body is compiled. Moreover, it should be noted, that template
parameter has higher priority than templateFile
one. Therefore, if both of them are set then explicitly defined then Markdown template from template
parameter is used.
In addition, Markdown templates support parameter substitution using Mustache Template notation, e.g.: This {{ parameterName }} has a value of {{ parameterValue }}
. List of available parameters that can be used for substitution in Markdown templates is given in Job Summary Parameters Available for Templates chapter below.
All targets that are saved to five or sent to Kafka are reduced to unified schema. Such approach have some advantages:
Thus, unified schema is following:
Column Name Column Type Comment jobId STRING ID of Data Quality Job referenceDate STRING Reference datetime for which job is run executionDate STRING Datetime of actual job start entityType STRING Type of result data STRING JSON string. Content varies depending in entityTypeFrom the schema above it is seen that all data that is specific to a results of each type is stored as JSON string. When sending results to Kafka, the schema would be the same but data
will become a nested JSON object.
It is already noted that HTML or Markdown templates used to build body of notifications support parameter substitution using Mustache Template notation. List of available parameters that can be used for substitution is shown below.
For example, Markdown template with check alert notification could look like:
# Checkita Data Quality Notification - Failed Check Alert\n\nYou requested notifications on failed checks in Data Quality Job: `{{ jobId}}`.\n\nInform you that some watched checks have failed for job started for:\n\n* Reference date: `{{ referenceDate }}`\n* Execution date: `{{ executionDate }}`\n\nAttached files contain information about failed checks. Please, review them.\n
jobId
- ID of the current Data Quality job.jobStatus
- Job status: Success
if all checks are passed, Failure
otherwise.referenceDate
- Reference datetime for which job is run.executionDate
- Datetime of actual job start.numSources
- Total number of sources in the job.numMetrics
- Total number of metric in the job.numChecks
- Total number of checks in the job.numLoadChecks
- Total number of load checks in the job.numMetricsWithErrors
- Number of metrics that yielded errors during their computation.numFailedChecks
- Number of failed checks.numFailedLoadChecks
- Number of failed load checks.listMetricsWithErrors
- List of all metrics that yielded errors during their computation.listFailedChecks
- List of failed checks.listFailedLoadChecks
- List of failed load checks.As it is shown in the example below, targets are grouped into subsections named after their type. These subsections may contain various target configuration depending on the channel where targets are saved or sent. Due to multiple check alert configurations are allowed then they are grouped as list of check alerts sent to a specific channel (email or mattermost).
jobConfig: {\n targets: {\n results: {\n file: {\n resultTypes: [\"checks\", \"loadChecks\"]\n save: {\n kind: \"delimited\"\n path: \"/tmp/dataquality/results\"\n header: true\n }\n }\n hive: {\n resultTypes: [\"regularMetrics\", \"composedMetrics\", \"loadChecks\", \"checks\", \"jobState\"],\n schema: \"WORKSPACE_CIBAA\",\n table: \"DQ_TARGETS\"\n }\n kafka: {\n resultTypes: [\"regularMetrics\", \"composedMetrics\", \"loadChecks\", \"checks\"],\n connection: \"kafka_broker\"\n topic: \"some.topic\"\n }\n }\n errorCollection: {\n file: {\n metrics: [\"pct_of_null\", \"hive_table_row_cnt\", \"hive_table_nulls\"]\n dumpSize: 50\n save: {\n kind: \"orc\"\n path: \"tmp/DQ/ERRORS\"\n }\n }\n kafka: {\n metrics: [\"hive_table_nulls\", \"fixed_file_dist_name\", \"table_source1_inn_regex\"]\n dumpSize: 25\n connection: \"kafka_broker\"\n topic: \"some.topic\"\n options: [\"addParam=true\"]\n }\n }\n summary: {\n email: {\n attachMetricErrors: true\n metrics: [\"hive_table_nulls\", \"fixed_file_dist_name\", \"table_source1_inn_regex\"]\n dumpSize: 10\n recipients: [\"some.person@some.domain\"]\n }\n mattermost: {\n attachMetricErrors: true\n metrics: [\"hive_table_nulls\", \"fixed_file_dist_name\", \"table_source1_inn_regex\"]\n dumpSize: 10\n recipients: [\"@someUser\", \"#someChannel\"]\n }\n kafka: {\n connection: \"kafka_broker\"\n topic: \"dev.dq_results.topic\"\n }\n }\n checkAlerts: {\n email: [\n {\n id: \"alert1\"\n checks: [\"avg_bal_check\", \"zero_nulls\"]\n recipients: [\"some.peron@some.domain\"]\n }\n {\n id: \"alert2\"\n checks: [\"top2_curr_match\", \"completeness_check\"]\n recipients: [\"another.peron@some.domain\"]\n }\n ]\n mattermost: [\n {\n id: \"alert3\"\n checks: [\"avg_bal_check\", \"zero_nulls\"]\n recipients: [\"@someUser\"]\n }\n {\n id: \"alert4\"\n checks: [\"top2_curr_match\", \"completeness_check\"]\n recipients: [\"#someChannel\"]\n }\n ]\n }\n }\n}\n
"},{"location":"03-job-configuration/11-FileOutputs/","title":"File Output Configuration","text":"Checkita framework has mechanism designed to save some it results to a file either in a local or remote (HDFS, S3, etc.) file system. Thus, it is possible to save virtual sources that are build during Data Quality job execution. Saved virtual sources can later be used for various purposes such as for investigating data quality problems. Apart from that, Checkita supports saving various Data Quality job results as files. In order to do that, it is required to configure targets of the desired type. See Targets Configuration for more information.
Thus, Checkita framework support saving file outputs of the following formats:
This, in order to configure file output it is required to supply following parameters:
kind
- Required. File format. Should be one of the following: delimited
, orc
, parquet
, avro
.path
- Required. File path to save. Spark DataFrame writer is used under hood to save outputs. Therefore, path, that is provided should point to a directory. If directory non-empty then content is overwritten.Additional parameters can be defined for delimited text file output. These are:
delimiter
- Optional, default is ,
. Column delimiter.quote
- Optional, default is \"
. Column enclosing character.escape
- Optional, default is \\
. Escape character.header
- Optional, default is false
. Boolean parameter indicating whether file should be written with columns header or without it.parquet file
{\n kind: \"parquet\"\n path: \"/tmp/parquet_file_ooutput\"\n }\n
delimited file
{\n kind: \"delimited\"\n path: \"/tmp/dataquality/results\"\n header: true\n}\n
Below example represents abstract but fully filled Data Quality job configuration with most of the features of Checkita framework configured.
jobConfig: {\n jobId: \"job_id_for_this_configuration\"\n\n connections: {\n oracle: [\n {id: \"oracle_db1\", url: \"oracle.db.com:1521/public\", username: \"db-user\", password: \"dq-password\"}\n ]\n sqlite: [\n {id: \"sqlite_db\", url: \"some/path/to/db.sqlite\"}\n ],\n kafka: [\n {id: \"kafka_broker\", servers: [\"server1:9092\", \"server2:9092\"]}\n ]\n }\n\n schemas: [\n {\n id: \"schema1\"\n kind: \"delimited\"\n schema: [\n {name: \"colA\", type: \"string\"},\n {name: \"colB\", type: \"timestamp\"},\n {name: \"colC\", type: \"decimal(10, 3)\"}\n ]\n },\n {\n id: \"schema2\"\n kind: \"fixedFull\",\n schema: [\n {name: \"col1\", type: \"integer\", width: 5},\n {name: \"col2\", type: \"double\", width: 6},\n {name: \"col3\", type: \"boolean\", width: 4}\n ]\n },\n {id: \"schema3\", kind: \"fixedShort\", schema: [\"colOne:5\", \"colTwo:7\", \"colThree:9\"]}\n {id: \"hive_schema\", kind: \"hive\", schema: \"some_schema\", table: \"some_table\"}\n {id: \"avro_schema\", kind: \"avro\", schema: \"some/path/to/avro_schema.avsc\"}\n\n ]\n\n sources: {\n table: [\n {id: \"table_source_1\", connection: \"oracle_db1\", table: \"some_table\", keyFields: [\"id\", \"name\"]}\n {id: \"table_source_2\", connection: \"sqlite_db\", table: \"other_table\"}\n ]\n hive: [\n {\n id: \"hive_source_1\", schema: \"some_schema\", table: \"some_table\",\n partitions: [{name: \"dlk_cob_date\", values: [\"2023-06-30\", \"2023-07-01\"]}],\n keyFields: [\"id\", \"name\"]\n }\n ]\n file: [\n {id: \"hdfs_avro_source\", kind: \"avro\", path: \"path/to/avro/file.avro\", schema: \"avro_schema\"},\n {id: \"hdfs_orc_source\", kind: \"orc\", path: \"path/to/orc/file.orc\"},\n {\n id: \"hdfs_delimited_source\",\n kind: \"delimited\",\n path: \"path/to/csv/file.csv\"\n schema: \"schema1\"\n },\n {id: \"hdfs_fixed_file\", kind: \"fixed\", path: \"path/to/fixed/file.txt\", schema: \"schema2\"},\n ],\n kafka: [\n {\n id: \"kafka_source\",\n connection: \"kafka_broker\",\n topics: [\"topic1.pub\", \"topic2.pub\"]\n format: \"json\"\n }\n ]\n }\n\n virtualSources: [\n {\n id: \"sqlVS\"\n kind: \"sql\"\n parentSources: [\"hive_source_1\"]\n persist: \"disk_only\"\n save: {\n kind: \"orc\"\n path: ${basePath}\"/sqlVs\"\n }\n query: \"select id, name, entity, description from hive_source_1 where dlk_cob_date == '2023-06-30'\"\n }\n {\n id: \"joinVS\"\n kind: \"join\"\n parentSources: [\"hdfs_avro_source\", \"hdfs_orc_source\"]\n joinBy: [\"id\"]\n joinType: \"leftouter\"\n persist: \"memory_only\"\n keyFields: [\"id\", \"order_id\"]\n }\n {\n id: \"filterVS\"\n kind: \"filter\"\n parentSources: [\"kafka_source\"]\n expr: [\"key is not null\"]\n keyFields: [\"batchId\", \"dttm\"]\n }\n {\n id: \"selectVS\"\n kind: \"select\"\n parentSources: [\"table_source_1\"]\n expr: [\n \"count(id) as id_cnt\",\n \"count(name) as name_cnt\"\n ]\n }\n {\n id: \"aggVS\"\n kind: \"aggregate\"\n parentSources: [\"hdfs_fixed_file\"]\n groupBy: [\"col1\"]\n expr: [\n \"avg(col2) as avg_col2\",\n \"sum(col3) as sum_col3\"\n ],\n keyFields: [\"col1\", \"avg_col2\", \"sum_col3\"]\n }\n ]\n\n loadChecks: {\n exactColumnNum: [\n {id: \"loadCheck1\", source: \"hdfs_delimited_source\", option: 3}\n ]\n minColumnNum: [\n {id: \"loadCheck2\", source: \"kafka_source\", option: 2}\n ]\n columnsExist: [\n {id: \"loadCheck3\", source: \"sqlVS\", columns: [\"id\", \"name\", \"entity\", \"description\"]},\n {id: \"load_check_4\", source: \"hdfs_delimited_source\", columns: [\"id\", \"name\", \"value\"]}\n ]\n schemaMatch: [\n {id: \"load_check_5\", source: \"kafka_source\", schema: \"hive_schema\"}\n ]\n }\n\n metrics: {\n regular: {\n rowCount: [\n {id: \"hive_table_row_cnt\", description: \"Row count in hive_source_1\", source: \"hive_source_1\"},\n {id: \"csv_file_row_cnt\", description: \"Row count in hdfs_delimited_source\", source: \"hdfs_delimited_source\"}\n ]\n distinctValues: [\n {\n id: \"fixed_file_dist_name\", description: \"Distinct values in hdfs_fixed_file\",\n source: \"hdfs_fixed_file\", columns: [\"colA\"]\n }\n ]\n nullValues: [\n {id: \"hive_table_nulls\", description: \"Null values in columns id and name\", source: \"hive_source_1\", columns: [\"id\", \"name\"]}\n ]\n completeness: [\n {id: \"orc_data_compl\", description: \"Completness of column id\", source: \"hdfs_orc_source\", columns: [\"id\"]}\n ]\n avgNumber: [\n {id: \"avro_file1_avg_bal\", description: \"Avg number of column balance\", source: \"hdfs_avro_source\", columns: [\"balance\"]}\n ]\n regexMatch: [\n {\n id: \"table_source1_inn_regex\", description: \"Regex match for inn column\", source: \"table_source_1\",\n columns: [\"inn\"], params: {regex: \"\"\"^\\d{10}$\"\"\"}\n }\n ]\n stringInDomain: [\n {\n id: \"orc_data_segment_domain\", source: \"hdfs_orc_source\",\n columns: [\"segment\"], params: {domain: [\"FI\", \"MID\", \"SME\", \"INTL\", \"CIB\"]}\n }\n ]\n topN: [\n {\n id: \"filterVS_top3_currency\", description: \"Top 3 currency in filterVS\", source: \"filterVS\",\n columns: [\"id\"], params: {targetNumber: 3, maxCapacity: 10}\n }\n ],\n levenshteinDistance: [\n {\n id: \"lvnstDist\", source: \"table_source_2\", columns: [\"col1\", \"col2\"],\n params: {normalize: true, threshold: 0.3}\n }\n ]\n }\n composed: [\n {\n id: \"pct_of_null\", description: \"Percent of null values in hive_table1\",\n formula: \"100 * {{ hive_table_nulls }} ^ 2 / ( {{ hive_table_row_cnt }} + 1)\"\n }\n ]\n }\n\n checks: {\n trend: {\n averageBoundFull: [\n {\n id: \"avg_bal_check\",\n description: \"Check that average balance stays within +/-25% of the week average\"\n metric: \"avro_file1_avg_bal\",\n rule: \"datetime\"\n windowSize: \"8d\"\n threshold: 0.25\n }\n ]\n averageBoundUpper: [\n {id: \"avg_pct_null\", metric: \"pct_of_null\", rule: \"datetime\", windowSize: \"15d\", threshold: 0.5}\n ]\n averageBoundLower: [\n {id: \"avg_distinct\", metric: \"fixed_file_dist_name\", rule: \"record\", windowSize: 31, threshold: 0.3}\n ]\n averageBoundRange: [\n {\n id: \"avg_inn_match\",\n metric: \"table_source1_inn_regex\",\n rule: \"datetime\",\n windowSize: \"8d\",\n thresholdLower: 0.2\n thresholdUpper: 0.4\n }\n ]\n topNRank: [\n {id: \"top2_curr_match\", metric: \"filterVS_top3_currency\", targetNumber: 2, threshold: 0.1}\n ]\n }\n snapshot: {\n differByLT: [\n {\n id: \"row_cnt_diff\",\n description: \"Number of rows in two tables should not differ on more than 5%.\",\n metric: \"hive_table_row_cnt\"\n compareMetric: \"csv_file_row_cnt\"\n threshold: 0.05\n }\n ]\n equalTo: [\n {id: \"zero_nulls\", description: \"Hive Table1 mustn't contain nulls\", metric: \"hive_table_nulls\", threshold: 0}\n ]\n greaterThan: [\n {id: \"completeness_check\", metric: \"orc_data_compl\", threshold: 0.99}\n ]\n lessThan: [\n {id: \"null_threshold\", metric: \"pct_of_null\", threshold: 0.01}\n ]\n }\n }\n\n targets: {\n results: {\n file: {\n resultTypes: [\"checks\", \"loadChecks\"]\n save: {\n kind: \"delimited\"\n path: ${basePath}\"/results/\"${referenceDate}\n header: true\n }\n }\n hive: {\n resultTypes: [\"regularMetrics\", \"composedMetrics\", \"loadChecks\", \"checks\"],\n schema: \"DQ_SCHEMA\",\n table: \"DQ_TARGETS\"\n }\n kafka: {\n resultTypes: [\"regularMetrics\", \"composedMetrics\", \"loadChecks\", \"checks\"],\n connection: \"kafka_broker\"\n topic: \"some.topic\"\n }\n }\n errorCollection: {\n file: {\n metrics: [\"pct_of_null\", \"hive_table_row_cnt\", \"hive_table_nulls\"]\n dumpSize: 50\n save: {\n kind: \"orc\"\n path: ${basePath}\"/errors/\"${referenceDate}\n }\n }\n kafka: {\n metrics: [\"hive_table_nulls\", \"fixed_file_dist_name\", \"table_source1_inn_regex\"]\n dumpSize: 25\n connection: \"kafka_broker\"\n topic: \"some.topic\"\n options: [\"addParam=true\"]\n }\n }\n summary: {\n email: {\n attachMetricErrors: true\n metrics: [\"hive_table_nulls\", \"fixed_file_dist_name\", \"table_source1_inn_regex\"]\n dumpSize: 10\n recipients: [\"some.person@some.domain\"]\n }\n mattermost: {\n attachMetricErrors: true\n metrics: [\"hive_table_nulls\", \"fixed_file_dist_name\", \"table_source1_inn_regex\"]\n dumpSize: 10\n recipients: [\"@someUser\", \"#someChannel\"]\n }\n kafka: {\n connection: \"kafka_broker\"\n topic: \"dev.dq_results.topic\"\n }\n }\n checkAlerts: {\n email: [\n {\n id: \"alert1\"\n checks: [\"avg_bal_check\", \"zero_nulls\"]\n recipients: [\"some.peron@some.domain\"]\n }\n {\n id: \"alert2\"\n checks: [\"top2_curr_match\", \"completeness_check\"]\n recipients: [\"another.peron@some.domain\"]\n }\n ]\n mattermost: [\n {\n id: \"alert3\"\n checks: [\"avg_bal_check\", \"zero_nulls\"]\n recipients: [\"@someUser\"]\n }\n {\n id: \"alert4\"\n checks: [\"top2_curr_match\", \"completeness_check\"]\n recipients: [\"#someChannel\"]\n }\n ]\n }\n }\n}\n
"},{"location":"ru/","title":"Home","text":"\u0410\u043a\u0442\u0443\u0430\u043b\u044c\u043d\u0430\u044f \u0432\u0435\u0440\u0441\u0438\u044f: 1.4.1
\u0414\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u0430\u0446\u0438\u044f \u043d\u0430 \u0440\u0443\u0441\u0441\u043a\u043e\u043c \u044f\u0437\u044b\u043a\u0435 \u043d\u0430\u0445\u043e\u0434\u0438\u0442\u0441\u044f \u0432 \u0441\u0442\u0430\u0434\u0438\u0438 \u0440\u0430\u0437\u0440\u0430\u0431\u043e\u0442\u043a\u0438. \u041f\u043e\u0436\u0430\u043b\u0443\u0439\u0441\u0442\u0430, \u043f\u043e\u043b\u044c\u0437\u0443\u0439\u0442\u0435\u0441\u044c \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u0430\u0446\u0438\u0435\u0439 \u043d\u0430 \u0430\u043d\u0433\u043b\u0438\u0439\u0441\u043a\u043e\u043c.
\u0414\u043b\u044f \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0435\u043d\u0438\u044f \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0430 \u0431\u043e\u043b\u044c\u0448\u0438\u0445 \u0434\u0430\u043d\u043d\u044b\u0445, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0442\u044c \u0440\u0430\u0441\u0447\u0435\u0442\u044b \u0431\u043e\u043b\u044c\u0448\u043e\u0433\u043e \u043a\u043e\u043b\u0438\u0447\u0435\u0441\u0442\u0432\u0430 \u043c\u0435\u0442\u0440\u0438\u043a \u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043e\u043a \u043d\u0430\u0434 \u043e\u0433\u0440\u043e\u043c\u043d\u044b\u043c\u0438 \u0434\u0430\u0442\u0430\u0441\u0435\u0442\u0430\u043c\u0438, \u0447\u0442\u043e \u0432 \u0441\u0432\u043e\u044e \u043e\u0447\u0435\u0440\u0435\u0434\u044c \u044f\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u0441\u043b\u043e\u0436\u043d\u043e\u0439 \u0437\u0430\u0434\u0430\u0447\u0435\u0439.
Checkita - \u044d\u0442\u043e Data Quality \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0440\u0435\u0448\u0430\u0435\u0442 \u044d\u0442\u0443 \u0437\u0430\u0434\u0430\u0447\u0443, \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u044f \u0444\u043e\u0440\u043c\u0430\u043b\u0438\u0437\u043e\u0432\u0430\u0442\u044c \u0438 \u0443\u043f\u0440\u043e\u0441\u0442\u0438\u0442\u044c \u043f\u0440\u043e\u0446\u0435\u0441\u0441 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u0438 \u0447\u0442\u0435\u043d\u0438\u044f \u0434\u0430\u043d\u043d\u044b\u0445 \u0438\u0437 \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432, \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u044f \u043c\u0435\u0442\u0440\u0438\u043a \u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043e\u043a \u043d\u0430\u0434 \u0434\u0430\u043d\u043d\u044b\u043c \u0432 \u044d\u0442\u0438\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0430\u0445, \u0430 \u0442\u0430\u043a\u0436\u0435 \u043e\u0442\u043f\u0440\u0430\u0432\u043a\u0443 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0438 \u0443\u0432\u0435\u0434\u043e\u043c\u043b\u0435\u043d\u0438\u0439 \u043f\u043e \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u043c \u043a\u0430\u043d\u0430\u043b\u0430\u043c.
\u0418\u0442\u0430\u043a, Checkita \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0442\u044c \u0440\u0430\u0441\u0447\u0435\u0442 \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0445 \u043c\u0435\u0442\u0440\u0438\u043a \u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043e\u043a \u043d\u0430\u0434 \u0434\u0430\u043d\u043d\u044b\u043c\u0438 (\u043a\u0430\u043a \u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u043c\u0438, \u0442\u0430\u043a \u0438 \u043d\u0435\u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u043c\u0438). \u0424\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a \u0441\u043f\u043e\u0441\u043e\u0431\u0435\u043d \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0442\u044c \u0440\u0430\u0441\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u043d\u044b\u0435 \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044f \u043d\u0430\u0434 \u0434\u0430\u043d\u043d\u044b\u043c\u0438 \u0437\u0430 \"\u043e\u0434\u0438\u043d \u043f\u0440\u043e\u0445\u043e\u0434\", \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044f Spark \u0432 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0435 \u0432\u044b\u0447\u0438\u0441\u043b\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0433\u043e \u044f\u0434\u0440\u0430. \u041a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u043e\u043d\u043d\u044b\u0435 Hocon \u0444\u0430\u0439\u043b\u044b \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0442\u0441\u044f \u043a\u0430\u043a \u0434\u043b\u044f \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u044f \u043d\u0430\u0441\u0442\u0440\u043e\u0435\u043a \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u0438, \u0442\u0430\u043a \u0438 \u0434\u043b\u044f \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u044f \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430 \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044f \u043c\u0435\u0442\u0440\u0438\u043a \u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043e\u043a. \u0420\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b \u0440\u0430\u0441\u0447\u0435\u0442\u043e\u0432 \u0441\u043e\u0445\u0440\u0430\u043d\u044f\u044e\u0442\u0441\u044f \u0432 \u0432\u044b\u0434\u0435\u043b\u0435\u043d\u043d\u0443\u044e \u0431\u0430\u0437\u0443 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430, \u0430 \u0442\u0430\u043a\u0436\u0435 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u043e\u0442\u043f\u0440\u0430\u0432\u043b\u0435\u043d\u044b \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044f\u043c \u043f\u043e \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u043c \u043a\u0430\u043d\u0430\u043b\u0430\u043c: \u0444\u0430\u0439\u043b (\u043b\u043e\u043a\u0430\u043b\u044c\u043d\u0430\u044f FS, HDFS, S3), Email, Mattermost, Kafka.
\u0418\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0435 Spark \u0432 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0435 \u0432\u044b\u0447\u0438\u0441\u043b\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0433\u043e \u044f\u0434\u0440\u0430 \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0442\u044c \u0440\u0430\u0441\u0447\u0435\u0442\u044b \u043c\u0435\u0442\u0440\u0438\u043a \u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043e\u043a \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 \"\u0441\u044b\u0440\u044b\u0445\" \u0434\u0430\u043d\u043d\u044b\u0445, \u043d\u0435 \u0442\u0440\u0435\u0431\u0443\u044f \u043a\u0430\u043a\u0438\u0445-\u043b\u0438\u0431\u043e SQL \u0430\u0431\u0441\u0442\u0440\u0430\u043a\u0446\u0438\u0439 \u043d\u0430\u0434 \u0434\u0430\u043d\u043d\u044b\u043c\u0438 (\u0442\u0430\u043a\u0438\u0445, \u043a\u0430\u043a Hive \u0438\u043b\u0438 Impala), \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0432 \u0441\u0432\u043e\u044e \u043e\u0447\u0435\u0440\u0435\u0434\u044c \u043c\u043e\u0433\u0443\u0442 \u0441\u043a\u0440\u044b\u0432\u0430\u0442\u044c \u043d\u0435\u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043e\u0448\u0438\u0431\u043a\u0438 \u0432 \u0434\u0430\u043d\u043d\u044b\u0445 (\u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u043f\u043b\u043e\u0445\u043e\u0435 \u0444\u043e\u0440\u043c\u0430\u0442\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f \u0438\u043b\u0438 \u043d\u0435\u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0438\u044f \u0441\u0445\u0435\u043c\u044b).
Checkita \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0442\u044c \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0435\u0435:
Checkita \u0440\u0430\u0437\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u0441 \u0444\u043e\u043a\u0443\u0441\u043e\u043c \u043d\u0430 \u0438\u043d\u0442\u0435\u0433\u0440\u0430\u0446\u0438\u044e \u0432 ETL \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u044b \u0438 \u0441\u0438\u0441\u0442\u0435\u043c\u044b \u043a\u0430\u0442\u0430\u043b\u043e\u0433\u043e\u0432 \u0434\u0430\u043d\u043d\u044b\u0445:
\u0415\u0449\u0435 \u043e\u0434\u043d\u043e\u0439 \u043a\u043b\u044e\u0447\u0435\u0432\u043e\u0439 \u043e\u0441\u043e\u0431\u0435\u043d\u043d\u043e\u0441\u0442\u044c\u044e \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430 Checkita \u044f\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0442\u044c \u043a\u0430\u043a \u0441\u0442\u0430\u0442\u0438\u0447\u043d\u044b\u0435 (batch), \u0442\u0430\u043a \u0438 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0435 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0438 \u0434\u0430\u043d\u043d\u044b\u0445. \u0422\u0430\u043a, \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u0437\u0430\u043f\u0443\u0441\u043a \u0434\u0432\u0443\u0445 \u0442\u0438\u043f\u043e\u0432 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0439: \u0434\u043b\u044f \u043f\u0430\u043a\u0435\u0442\u043d\u043e\u0439 \u0438 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u043e\u0439 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432 \u0434\u0430\u043d\u043d\u044b\u0445. \u041f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0439 \u0440\u0435\u0436\u0438\u043c \u0440\u0430\u0431\u043e\u0442\u044b \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430 \u0432 \u0434\u0430\u043d\u043d\u044b\u0445 \u043c\u043e\u043c\u0435\u043d\u0442 \u043d\u0430\u0445\u043e\u0434\u0438\u0442\u0441\u044f \u0432 \u044d\u043a\u0441\u043f\u0435\u0440\u0438\u043c\u0435\u043d\u0442\u0430\u043b\u044c\u043d\u043e\u0439 \u0441\u0442\u0430\u0434\u0438\u0438, \u0438 \u043f\u043e\u044d\u0442\u043e\u043c\u0443 \u0432 \u044d\u0442\u043e\u0439 \u0447\u0430\u0441\u0442\u0438 \u0440\u0430\u0431\u043e\u0442\u044b \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430 \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u044b \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u0438\u044f.
\u0424\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a \u043d\u0430\u043f\u0438\u0441\u0430\u043d \u043d\u0430 Scala 2.12 \u0438 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442 Spark 2.4+ \u0432 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0435 \u0432\u044b\u0447\u0438\u0441\u043b\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0433\u043e \u044f\u0434\u0440\u0430. \u0412 \u043f\u0440\u043e\u0435\u043a\u0442\u0435 \u043d\u0430\u0441\u0442\u0440\u043e\u0435\u043d\u0430 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u0438\u0437\u0443\u0435\u043c\u0430\u044f \u0441\u0431\u043e\u0440\u043a\u0430, \u043a\u043e\u0442\u043e\u0440\u0430\u044f \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u0441\u043e\u0431\u0438\u0440\u0430\u0442\u044c \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a \u043f\u043e\u0434 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u043d\u0443\u044e \u0432\u0435\u0440\u0441\u0438\u044e Spark, \u043f\u0443\u0431\u043b\u0438\u043a\u043e\u0432\u0430\u0442\u044c \u043f\u0440\u043e\u0435\u043a\u0442 \u0432 \u0437\u0430\u0434\u0430\u043d\u043d\u044b\u0439 \u0440\u0435\u043f\u043e\u0437\u0438\u0442\u043e\u0440\u0438\u0439, \u0430 \u0442\u0430\u043a\u0436\u0435 \u0441\u043e\u0431\u0438\u0440\u0430\u0442\u044c Uber-jar, \u043a\u0430\u043a \u0441 \u0437\u0430\u0432\u0438\u0441\u0438\u043c\u043e\u0441\u0442\u044f\u043c\u0438 Spark, \u0442\u0430\u043a \u0438 \u0431\u0435\u0437 \u043d\u0438\u0445.
\u041b\u0438\u0446\u0435\u043d\u0437\u0438\u044f
\u0424\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a Checkita \u0440\u0430\u0441\u043f\u0440\u043e\u0441\u0442\u0440\u0430\u043d\u044f\u0435\u0442\u0441\u044f \u043f\u043e\u0434 \u043b\u0438\u0446\u0435\u043d\u0437\u0438\u0435\u0439 GNU LGPL.
\u0414\u0430\u043d\u043d\u044b\u0439 \u043f\u0440\u043e\u0435\u043a\u0442 - \u044d\u0442\u043e \u043f\u0435\u0440\u0435\u043e\u0441\u043c\u044b\u0441\u043b\u0435\u043d\u0438\u0435 Data Quality \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430 \u0440\u0430\u0437\u0440\u0430\u0431\u043e\u0442\u0430\u043d\u043d\u043e\u0433\u043e \u043a\u043e\u043c\u043f\u0430\u043d\u0438\u0435\u0439 Agile Lab, \u0418\u0442\u0430\u043b\u0438\u044f.
"},{"location":"ru/01-application-setup/","title":"\u041e\u0431\u0449\u0430\u044f \u0418\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u044f","text":"Checkita \u0440\u0430\u0431\u043e\u0442\u0430\u0435\u0442 \u043a\u0430\u043a Spark-\u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435. \u0421\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0435\u043d\u043d\u043e, \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u0437\u0430\u043f\u0443\u0449\u0435\u043d\u043e \u0442\u0430\u043a\u0438\u043c \u0436\u0435 \u043e\u0431\u0440\u0430\u0437\u043e\u043c, \u043a\u0430\u043a \u0438 \u043b\u044e\u0431\u043e\u0435 \u0434\u0440\u0443\u0433\u043e\u0435 Spark-\u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435:
\u0422\u0430\u043a\u0436\u0435 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u044e\u0442\u0441\u044f \u043e\u0431\u0430 \u0440\u0435\u0436\u0438\u043c\u0430 \u0437\u0430\u043f\u0443\u0441\u043a\u0430 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f: client
and cluster
.
\u0424\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a \u0440\u0430\u0437\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u0432 \u043f\u0435\u0440\u0432\u0443\u044e \u043e\u0447\u0435\u0440\u0435\u0434\u044c \u0434\u043b\u044f \u043f\u0430\u043a\u0435\u0442\u043d\u043e\u0439 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0438 \u0434\u0430\u043d\u043d\u044b\u0445 \u0438 \u043d\u0430 \u0434\u0430\u043d\u043d\u044b\u0439 \u043c\u043e\u043c\u0435\u043d\u0442 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442 \u0442\u043e\u043b\u044c\u043a\u043e \u0442\u0430\u043a\u043e\u0439 \u0440\u0435\u0436\u0438\u043c \u0440\u0430\u0431\u043e\u0442\u044b. \u0422\u0438\u043f\u043e\u0432\u0430\u044f \u0430\u0440\u0445\u0438\u0442\u0435\u043a\u0442\u0443\u0440\u0430 \u0434\u043b\u044f \u0440\u0430\u0431\u043e\u0442\u044b \u0441 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u043e\u043c \u043f\u043e\u043a\u0430\u0437\u0430\u043d\u0430 \u043d\u0430 \u0441\u0445\u0435\u043c\u0435 \u043d\u0438\u0436\u0435:
\u0422\u0430\u043a\u0436\u0435, Data Quality Framework \u043c\u043e\u0436\u0435\u0442 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c\u0441\u044f \u0438 \u0434\u043b\u044f \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u043e\u0439 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0438 \u0434\u0430\u043d\u043d\u044b\u0445, \u043e\u0434\u043d\u0430\u043a\u043e \u0434\u0430\u043d\u043d\u044b\u0439 \u0444\u0443\u043d\u043a\u0446\u0438\u043e\u043d\u0430\u043b \u043d\u0430\u0445\u043e\u0434\u0438\u0442\u0441\u044f \u0432 \u044d\u043a\u0441\u043f\u0435\u0440\u0438\u043c\u0435\u043d\u0442\u0430\u043b\u044c\u043d\u043e\u0439 \u0441\u0442\u0430\u0434\u0438\u0438. \u0414\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043e \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0435 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432 \u0434\u0430\u043d\u043d\u044b\u0445 \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u041f\u0440\u043e\u0432\u0435\u0440\u043a\u0430 \u041a\u0430\u0447\u0435\u0441\u0442\u0432\u0430 \u041f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0445 \u0418\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432 \u0414\u0430\u043d\u043d\u044b\u0445.
"},{"location":"ru/01-application-setup/01-ApplicationSettings/","title":"\u041d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0438 \u041f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f","text":"\u041e\u0431\u0449\u0438\u0435 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f Checkita Data Quality \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0438\u0440\u0443\u044e\u0442\u0441\u044f \u0432 Hocon \u0444\u0430\u0439\u043b\u0435 application.conf
, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u043f\u0435\u0440\u0435\u0434\u0430\u0435\u0442\u0441\u044f \u0432 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0435\u0433\u043e \u0441\u0442\u0430\u0440\u0442\u0430. \u0412\u0441\u0435 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0438 \u0437\u0430\u0434\u0430\u044e\u0442\u0441\u044f \u0432\u043d\u0443\u0442\u0440\u0438 \u0441\u0435\u043a\u0446\u0438\u0438 appConfig
.
\u0415\u0434\u0438\u043d\u0441\u0442\u0432\u0435\u043d\u043d\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0437\u0430\u0434\u0430\u0435\u0442\u0441\u044f \u043d\u0430 \u0432\u0435\u0440\u0445\u043d\u0435\u043c \u0443\u0440\u043e\u0432\u043d\u0435 - \u044d\u0442\u043e applicationName
: \u0438\u043c\u044f Spark \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f. \u042d\u0442\u043e \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440 \u043e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u044b\u0439, \u0438 \u0435\u0441\u043b\u0438 \u043e\u043d \u043d\u0435 \u0437\u0430\u0434\u0430\u043d, \u0442\u043e \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u0431\u0443\u0434\u0435\u0442 \u0437\u0430\u043f\u0443\u0449\u0435\u043d\u043e \u0441 \u0438\u043c\u0435\u043d\u0435\u043c Checkita Data Quality
\u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e.
\u041e\u0441\u0442\u0430\u043b\u044c\u043d\u044b\u0435 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u0437\u0430\u0434\u0430\u044e\u0442\u0441\u044f \u0432 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0443\u044e\u0449\u0438\u0445 \u043f\u043e\u0434-\u0441\u0435\u043a\u0446\u0438\u044f\u0445, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043e\u043f\u0438\u0441\u0430\u043d\u044b \u043d\u0438\u0436\u0435:
"},{"location":"ru/01-application-setup/01-ApplicationSettings/#_2","title":"\u041d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0438 \u0434\u0430\u0442\u044b \u0438 \u0432\u0440\u0435\u043c\u0435\u043d\u0438","text":"\u041d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0438 \u0434\u0430\u0442\u044b \u0438 \u0432\u0440\u0435\u043c\u0435\u043d\u0438 \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u0432 \u0441\u0435\u043a\u0446\u0438\u0438 dateTimeOptions
. \u0414\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043e \u0440\u0430\u0431\u043e\u0442\u0435 \u0441 \u0434\u0430\u0442\u0430\u043c\u0438 \u0432 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0435 Checkita Data Quality, \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u0420\u0430\u0431\u043e\u0442\u0430 \u0441 \u0414\u0430\u0442\u0430\u043c\u0438.
\u041f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b \u0434\u043b\u044f \u0440\u0430\u0431\u043e\u0442\u044b \u0441 \u0434\u0430\u0442\u0430\u043c\u0438 \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0435:
timeZone
- \u0412\u0440\u0435\u043c\u0435\u043d\u043d\u0430\u044f \u0437\u043e\u043d\u0430, \u0432 \u043a\u043e\u0442\u043e\u0440\u043e\u0439 \u0443\u043a\u0430\u0437\u0430\u043d\u043e \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u0435 \u0434\u0430\u0442\u044b. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e \"UTC\"
.referenceDateFormat
- \u0444\u043e\u0440\u043c\u0430\u0442 \u0434\u0430\u0442\u044b/\u0432\u0440\u0435\u043c\u0435\u043d\u0438, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0433\u043e \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u0440\u0435\u0444\u0435\u0440\u0435\u043d\u0442\u043d\u043e\u0439 \u0434\u0430\u0442\u044b. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e \"yyyy-MM-dd'T'HH:mm:ss.SSS\"
.executionDateFormat
- \u0444\u043e\u0440\u043c\u0430\u0442 \u0434\u0430\u0442\u044b/\u0432\u0440\u0435\u043c\u0435\u043d\u0438, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0433\u043e \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u0434\u0430\u0442\u044b \u0441\u0442\u0430\u0440\u0442\u0430 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e \"yyyy-MM-dd'T'HH:mm:ss.SSS\"
\u0415\u0441\u043b\u0438 \u0441\u0435\u043a\u0446\u0438\u044f dateTimeOptions
\u043d\u0435 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u0430, \u0442\u043e \u0431\u0443\u0434\u0443\u0442 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c\u0441\u044f \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u044f \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e \u0434\u043b\u044f \u0432\u0441\u0435\u0445 \u0432\u044b\u0448\u0435\u0443\u043a\u0430\u0437\u0430\u043d\u043d\u044b\u0445 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432.
\u0414\u0430\u043d\u043d\u044b\u0439 \u043d\u0430\u0431\u043e\u0440 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0439 \u043f\u0440\u0438\u043c\u0435\u043d\u0438\u043c \u0442\u043e\u043b\u044c\u043a\u043e \u0434\u043b\u044f \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0445 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0439 \u0438 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u0442 \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0435 \u0430\u0441\u043f\u0435\u043a\u0442\u044b \u0432\u044b\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f \u043f\u0440\u043e\u0432\u0435\u0440\u043e\u043a \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0430 \u043d\u0430\u0434 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u043c\u0438 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0430\u043c\u0438. \u0414\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043e \u0437\u0430\u043f\u0443\u0441\u043a\u0435 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430 \u0432 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u043e\u043c \u0440\u0435\u0436\u0438\u043c\u0435, \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u041f\u0440\u043e\u0432\u0435\u0440\u043a\u0430 \u041a\u0430\u0447\u0435\u0441\u0442\u0432\u0430 \u041f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0445 \u0418\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432 \u0414\u0430\u043d\u043d\u044b\u0445.
trigger
- \u0422\u0440\u0438\u0433\u0433\u0435\u0440\u043d\u044b\u0439 \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b: \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u0442 \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0439 \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b \u0437\u0430 \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0431\u0443\u0434\u0443\u0442 \u0441\u043e\u0431\u0438\u0440\u0430\u0442\u044c\u0441\u044f \u043c\u0438\u043a\u0440\u043e-\u043f\u0430\u043a\u0435\u0442\u044b \u0438\u0437 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u043e\u0433\u043e \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0430 \u0434\u0430\u043d\u043d\u044b\u0445. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e 10s
.window
- \u041e\u043a\u043e\u043d\u043d\u044b\u0439 \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b: \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u0442 \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0439 \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b, \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0432\u0443\u044e\u0449\u0438\u0439 \u0440\u0430\u0437\u043c\u0435\u0440\u0443 \u043e\u043a\u043d\u0430, \u0432 \u043a\u043e\u0442\u043e\u0440\u043e\u043c \u0431\u0443\u0434\u0443\u0442 \u0430\u043a\u043a\u0443\u043c\u0443\u043b\u0438\u0440\u043e\u0432\u0430\u043d\u044b \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b \u0440\u0430\u0441\u0447\u0435\u0442\u0430 \u043c\u0435\u0442\u0440\u0438\u043a. \u0412\u0441\u0435 \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438 \u0432\u044b\u0447\u0438\u0441\u043b\u044f\u044e\u0442\u0441\u044f \u0434\u043b\u044f \u043a\u0430\u0436\u0434\u043e\u0433\u043e \u043e\u043a\u043d\u0430, \u043a\u0430\u043a \u0442\u043e\u043b\u044c\u043a\u043e \u043e\u043d\u043e \u0431\u0443\u0434\u0435\u0442 \u043f\u043e\u043b\u043d\u043e\u0441\u0442\u044c\u044e \u0441\u0444\u043e\u0440\u043c\u0438\u0440\u043e\u0432\u0430\u043d\u043e (\u043e\u043f\u0443\u0441\u0442\u0438\u0442\u0441\u044f \u043d\u0438\u0436\u0435 \u0443\u0440\u043e\u0432\u043d\u044f \"\u0432\u043e\u0434\u044f\u043d\u043e\u0433\u043e \u0437\u043d\u0430\u043a\u0430\"). \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e 10m
.watermark
- \u0423\u0440\u043e\u0432\u0435\u043d\u044c \"\u0432\u043e\u0434\u044f\u043d\u043e\u0433\u043e \u0437\u043d\u0430\u043a\u0430\": \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u0442 \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0439 \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b \u043f\u043e\u0441\u043b\u0435 \u043a\u043e\u0442\u043e\u0440\u043e\u0433\u043e, \"\u043e\u043f\u043e\u0437\u0434\u0430\u0432\u0448\u0438\u0435\" \u0437\u0430\u043f\u0438\u0441\u0438 \u043d\u0435 \u0431\u0443\u0434\u0443\u0442 \u0431\u0440\u0430\u0442\u044c\u0441\u044f \u0432 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0443. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e 5m
.allowEmptyWindows
- \u0411\u0438\u043d\u0430\u0440\u043d\u044b\u0439 \u0444\u043b\u0430\u0433, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u044e\u0449\u0438\u0439, \u0440\u0430\u0437\u0440\u0435\u0448\u0435\u043d\u044b \u043b\u0438 \"\u043f\u0443\u0441\u0442\u044b\u0435\" \u043e\u043a\u043d\u0430 (\u043e\u043a\u043d\u0430 \u0432 \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043d\u0435 \u043f\u043e\u043f\u0430\u043b\u043e \u043d\u0438 \u043e\u0434\u043d\u043e\u0439 \u0441\u0442\u0440\u043e\u043a\u0438 \u0434\u0430\u043d\u043d\u044b\u0445). \u0422\u0430\u043a, \u0432 \u0441\u0438\u0442\u0443\u0430\u0446\u0438\u044f\u0445, \u043a\u043e\u0433\u0434\u0430 \u043e\u043a\u043d\u043e \u043f\u043e\u043b\u043d\u043e\u0441\u0442\u044c\u044e \u0441\u0444\u043e\u0440\u043c\u0438\u0440\u043e\u0432\u0430\u043d\u043e \u0438 \u043f\u0440\u0438 \u044d\u0442\u043e\u043c \u0434\u043b\u044f \u043e\u0434\u043d\u043e\u0433\u043e \u0438\u043b\u0438 \u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u0438\u0445 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432 \u0432 \u044d\u0442\u043e \u043e\u043a\u043d\u043e \u043d\u0435 \u043f\u043e\u043f\u0430\u043b\u043e \u043d\u0438 \u043e\u0434\u043d\u043e\u0439 \u0441\u0442\u0440\u043e\u043a\u0438, \u0442\u043e \u0432\u0441\u0435 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0443\u044e\u0449\u0438\u0435 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438 \u0431\u0443\u0434\u0443\u0442 \u043e\u043f\u0443\u0449\u0435\u043d\u044b \u0442\u043e\u043b\u044c\u043a\u043e \u0432 \u0442\u043e\u043c \u0441\u043b\u0443\u0447\u0430\u0435, \u043a\u043e\u0433\u0434\u0430 \u0434\u0430\u043d\u043d\u044b\u0445 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440 \u0431\u0443\u0434\u0435\u0442 \u0438\u043c\u0435\u0442\u044c \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435 true
. \u0412 \u043f\u0440\u043e\u0442\u0438\u0432\u043d\u043e\u043c \u0441\u043b\u0443\u0447\u0430\u0435, \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438 \u0431\u0443\u0434\u0443\u0442 \u0432\u044b\u043f\u043e\u043b\u043d\u0435\u043d\u044b \u0438 \u0432\u0435\u0440\u043d\u0443\u0442 \u043e\u0448\u0438\u0431\u043a\u0443 \u0441 \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u0435\u043c \u0432\u0438\u0434\u0430 ... metric results were not found ...
. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e false
.\u0412\u0410\u0416\u041d\u041e \u0412\u0441\u0435 \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b\u044b \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u044b \u043a\u0430\u043a \u0441\u0442\u0440\u043e\u043a\u0438 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0443\u044e\u0449\u0435\u043c \u0444\u043e\u0440\u043c\u0430\u0442\u0443 Scala Duration.
"},{"location":"ru/01-application-setup/01-ApplicationSettings/#_4","title":"\u0410\u043a\u0442\u0438\u0432\u0430\u0442\u043e\u0440\u044b","text":"\u0421\u0435\u043a\u0446\u0438\u044f enablers
\u0432 \u0444\u0430\u0439\u043b\u0435 \u0441 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u0442 \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0435 \u0431\u0438\u043d\u0430\u0440\u043d\u044b\u0435 \u0430\u043a\u0442\u0438\u0432\u0430\u0442\u043e\u0440\u044b \u0438\u043b\u0438 \u0447\u0438\u0441\u043b\u043e\u0432\u044b\u0435 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043a\u043e\u043d\u0442\u0440\u043e\u043b\u0438\u0440\u0443\u044e\u0442 \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0435 \u0430\u0441\u043f\u0435\u043a\u0442\u044b \u0438\u0441\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f data quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430:
allowSqlQueries
- \u0411\u0438\u043d\u0430\u0440\u043d\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u043a\u043e\u043d\u0442\u0440\u043e\u043b\u0438\u0440\u0443\u0435\u0442 \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u044f \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0445 SQL \u0437\u0430\u043f\u0440\u043e\u0441\u043e\u0432 \u043f\u0440\u0438 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u0438 \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e false
allowNotifications
- \u0411\u0438\u043d\u0430\u0440\u043d\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u043a\u043e\u043d\u0442\u0440\u043e\u043b\u0438\u0440\u0443\u0435\u0442 \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u043e\u0442\u043f\u0440\u0430\u0432\u043a\u0438 \u0443\u0432\u0435\u0434\u043e\u043c\u043b\u0435\u043d\u0438\u0439 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044f\u043c. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e false
aggregatedKafkaOutput
- \u0411\u0438\u043d\u0430\u0440\u043d\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u043e\u0442\u043f\u0440\u0430\u0432\u043b\u044f\u0442\u044c \u0430\u0433\u0440\u0435\u0433\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u0435 \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u044f \u0434\u043b\u044f Kafka \u0442\u0430\u0440\u0433\u0435\u0442\u043e\u0432 (\u043f\u043e \u043e\u0434\u043d\u043e\u043c\u0443 \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u044e \u043d\u0430 \u043a\u0430\u0436\u0434\u044b\u0439 \u0442\u0438\u043f \u0442\u0430\u0440\u0433\u0435\u0442\u0430). \u041f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e \u043e\u0442\u043f\u0440\u0430\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u043e\u0442\u0434\u0435\u043b\u044c\u043d\u043e\u0435 \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u0435 \u0432 Kafka \u0434\u043b\u044f \u043a\u0430\u0436\u0434\u043e\u0439 \u0441\u0443\u0449\u043d\u043e\u0441\u0442\u0438. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e false
enableCaseSensitivity
- \u0411\u0438\u043d\u0430\u0440\u043d\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0443\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442 \u0447\u0443\u0432\u0441\u0442\u0432\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0441\u0442\u044c \u043a \u0440\u0435\u0433\u0438\u0441\u0442\u0440\u0443 \u0432 \u0438\u043c\u0435\u043d\u0430\u0445 \u043a\u043e\u043b\u043e\u043d\u043e\u043a. \u0422\u0430\u043a\u0438\u043c \u043e\u0431\u0440\u0430\u0437\u043e\u043c \u043a\u043e\u043d\u0442\u0440\u043e\u043b\u0438\u0440\u0443\u0435\u0442\u0441\u044f, \u043a\u0430\u043a \u0438\u043c\u0435\u043d\u0430 \u043a\u043e\u043b\u043e\u043d\u043e\u043a \u0431\u0443\u0434\u0443\u0442 \u0441\u0440\u0430\u0432\u043d\u0438\u0432\u0430\u0442\u044c\u0441\u044f \u043c\u0435\u0436\u0434\u0443 \u0441\u043e\u0431\u043e\u0439 \u0438 \u043a\u0430\u043a \u0431\u0443\u0434\u0435\u0442 \u043f\u0440\u043e\u0438\u0441\u0445\u043e\u0434\u0438\u0442\u044c \u0438\u0445 \u043f\u043e\u0438\u0441\u043a \u0432 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0435. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e false
errorDumpSize
- \u041c\u0430\u043a\u0441\u0438\u043c\u0430\u043b\u044c\u043d\u043e\u0435 \u043a\u043e\u043b\u0438\u0447\u0435\u0441\u0442\u0432\u043e \u043e\u0448\u0438\u0431\u043e\u043a, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0441\u043e\u0431\u0440\u0430\u043d\u044b \u0434\u043b\u044f \u043e\u0442\u0434\u0435\u043b\u044c\u043d\u043e\u0439 \u043c\u0435\u0442\u0440\u0438\u043a\u0438. \u0424\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a \u0438\u043c\u0435\u0435\u0442 \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u0441\u043e\u0431\u0438\u0440\u0430\u0442\u044c \u0434\u0430\u043d\u043d\u044b\u0435 \u0438\u0437 \u0441\u0442\u0440\u043e\u043a\u0438 \u0432 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0435, \u0434\u043b\u044f \u043a\u043e\u0442\u043e\u0440\u043e\u0439 \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u0435 \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0437\u0430\u0432\u0435\u0440\u0448\u0438\u043b\u043e\u0441\u044c \u0441 \u043e\u0448\u0438\u0431\u043a\u043e\u0439. \u041e\u0434\u043d\u0430\u043a\u043e, \u0434\u043b\u044f \u0442\u043e\u0433\u043e, \u0447\u0442\u043e\u0431\u044b \u043f\u0440\u0435\u0434\u043e\u0442\u0432\u0440\u0430\u0442\u0438\u0442\u044c OOM, \u043c\u0430\u043a\u0441\u0438\u043c\u0430\u043b\u044c\u043d\u043e\u0435 \u043a\u043e\u043b\u0438\u0447\u0435\u0441\u0442\u0432\u043e \u043e\u0448\u0438\u0431\u043e\u043a \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0438\u0442\u044c \u0432 \u0440\u0430\u0437\u0443\u043c\u043d\u044b\u0445 \u043f\u0440\u0435\u0434\u0435\u043b\u0430\u0445. \u0422\u0430\u043a, \u043c\u0430\u043a\u0441\u0438\u043c\u0430\u043b\u044c\u043d\u043e \u0434\u043e\u043f\u0443\u0441\u0442\u0438\u043c\u043e\u0435 \u043a\u043e\u043b\u0438\u0447\u0435\u0441\u0442\u0432\u043e \u043e\u0448\u0438\u0431\u043e\u043a \u043d\u0430 \u043c\u0435\u0442\u0440\u0438\u043a\u0443 \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0435\u043d\u043e \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 10000
. \u041d\u043e \u0435\u0433\u043e \u0442\u0430\u043a\u0436\u0435 \u043c\u043e\u0436\u043d\u043e \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u043e \u0441\u043d\u0438\u0437\u0438\u0442\u044c \u0437\u0430\u0434\u0430\u0432 \u044d\u0442\u043e\u0442 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e 10000
outputRepartition
- \u0423\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442 \u043a\u043e\u043b\u0438\u0447\u0435\u0441\u0442\u0432\u043e \u043f\u0430\u0440\u0442\u0438\u0446\u0438\u0439 \u043f\u0440\u0438 \u0437\u0430\u043f\u0438\u0441\u0438 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0432 \u0444\u0430\u0439\u043b\u044b. \u041f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e \u0437\u0430\u043f\u0438\u0441\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u043e\u0434\u0438\u043d \u0444\u0430\u0439\u043b. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e 1
\u0415\u0441\u043b\u0438 \u0441\u0435\u043a\u0446\u0438\u044f enablers
\u043d\u0435 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u0430, \u0442\u043e \u0431\u0443\u0434\u0443\u0442 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c\u0441\u044f \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u044f \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e \u0434\u043b\u044f \u0432\u0441\u0435\u0445 \u0432\u044b\u0448\u0435\u0443\u043a\u0430\u0437\u0430\u043d\u043d\u044b\u0445 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432.
\u041f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0431\u0430\u0437\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 \u0441 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u0430\u043c\u0438 \u0440\u0430\u0431\u043e\u0442\u044b Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u043e\u0432 \u0437\u0430\u0434\u0430\u044e\u0442\u0441\u044f \u0432 \u0441\u0435\u043a\u0446\u0438\u0438 storage
\u0444\u0430\u0439\u043b\u0430 \u0441 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f.
\u0414\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043e \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u0425\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0420\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432.
\u0422\u0430\u043a\u0438\u043c \u043e\u0431\u0440\u0430\u0437\u043e\u043c, \u0434\u043b\u044f \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0438 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0431\u0430\u0437\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 \u0441 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u0430\u043c\u0438, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0437\u0430\u0434\u0430\u0442\u044c \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0435 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b:
dbType
- \u0422\u0438\u043f \u0431\u0430\u0437\u044b \u0434\u0430\u043d\u043d\u044b\u0445, \u043a\u043e\u0442\u043e\u0440\u0430\u044f \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432. \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e.url
- URL \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0431\u0430\u0437\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 (\u0431\u0435\u0437 \u0443\u043a\u0430\u0437\u0430\u043d\u0438\u044f \u043f\u0440\u043e\u0442\u043e\u043a\u043e\u043b\u043e\u0432). \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e.username
- \u0418\u043c\u044f \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044f \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0431\u0430\u0437\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f). \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e.password
- \u041f\u0430\u0440\u043e\u043b\u044c \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0431\u0430\u0437\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f). \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e.schema
- \u0421\u0445\u0435\u043c\u0430 \u0432 \u043a\u043e\u0442\u043e\u0440\u043e\u0439 \u043d\u0430\u0445\u043e\u0434\u044f\u0442\u0441\u044f \u0442\u0430\u0431\u043b\u0438\u0446\u044b \u0441 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u0430\u043c\u0438 Data Quality (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f). \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e.\u0412\u0410\u0416\u041d\u041e \u0415\u0441\u043b\u0438 \u0441\u0435\u043a\u0446\u0438\u044f storage
\u043d\u0435 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u0430, \u0442\u043e \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u0431\u0443\u0434\u0435\u0442 \u0437\u0430\u043f\u0443\u0449\u0435\u043d\u043e \u0431\u0435\u0437 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0431\u0430\u0437\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 \u0441 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u0430\u043c\u0438:
\u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u043e\u0442\u043f\u0440\u0430\u0432\u043b\u044f\u0442\u044c \u0443\u0432\u0435\u0434\u043e\u043c\u043b\u0435\u043d\u0438\u044f \u043d\u0430 Email, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u043d\u0430\u0441\u0442\u0440\u043e\u0438\u0442\u044c \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0435 \u043a SMTP \u0441\u0435\u0440\u0432\u0435\u0440\u0443, \u043a\u043e\u0442\u043e\u0440\u043e\u0435 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0438\u0440\u0443\u0435\u0442\u0441\u044f \u0432 \u0441\u0435\u043a\u0446\u0438\u0438 email
\u0444\u0430\u0439\u043b\u0430 \u0441 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u0441\u043e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u043c\u0438 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u0430\u043c\u0438:
host
- \u0410\u0434\u0440\u0435\u0441 SMTP \u0441\u0435\u0440\u0432\u0435\u0440\u0430. \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e.port
- \u041f\u043e\u0440\u0442 \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a SMTP \u0441\u0435\u0440\u0432\u0435\u0440\u0443. \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e.address
- \u0410\u0434\u0440\u0435\u0441 \u043e\u0442\u043f\u0440\u0430\u0432\u0438\u0442\u0435\u043b\u044f. \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e.name
- \u0418\u043c\u044f \u043e\u0442\u043f\u0440\u0430\u0432\u0438\u0442\u0435\u043b\u044f. \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e.sslOnConnect
- \u0411\u0438\u043d\u0430\u0440\u043d\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440, \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u044e\u0449\u0438\u0439 \u043d\u0430 \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e\u0441\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u044f SSL \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e false
.tlsEnabled
- \u0411\u0438\u043d\u0430\u0440\u043d\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440, \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u044e\u0449\u0438\u0439 \u043d\u0430 \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e\u0441\u0442\u044c \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u0438 TLS \u043f\u0440\u0438 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0438. \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e, \u043f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e false
.username
- \u0418\u043c\u044f \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044f \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a SMTP \u0441\u0435\u0440\u0432\u0435\u0440\u0443 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f). \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e.password
- \u041f\u0430\u0440\u043e\u043b\u044c \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a SMTP \u0441\u0435\u0440\u0432\u0435\u0440\u0443 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f). \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e.\u0415\u0441\u043b\u0438 \u0441\u0435\u043a\u0446\u0438\u044f email
\u043d\u0435 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u0430, \u0442\u043e \u0443\u0432\u0435\u0434\u043e\u043c\u043b\u0435\u043d\u0438\u044f \u043d\u0430 Email \u043d\u0435 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u043e\u0442\u043f\u0440\u0430\u0432\u043b\u0435\u043d\u044b. \u041f\u0440\u0438 \u044d\u0442\u043e\u043c, \u0435\u0441\u043b\u0438 \u0442\u0430\u043a\u0438\u0435 \u0443\u0432\u0435\u0434\u043e\u043c\u043b\u0435\u043d\u0438\u044f \u0431\u044b\u043b\u0438 \u0441\u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u044b \u0432 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u0445 \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430, \u0442\u043e \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0438\u0441\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f \u0431\u0443\u0434\u0435\u0442 \u0431\u0440\u043e\u0448\u0435\u043d\u0430 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0432\u0443\u044e\u0449\u0430\u044f \u043e\u0448\u0438\u0431\u043a\u0430.
\u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u043e\u0442\u043f\u0440\u0430\u0432\u043b\u044f\u0442\u044c \u0443\u0432\u0435\u0434\u043e\u043c\u043b\u0435\u043d\u0438\u044f \u0432 Mattermost, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u043d\u0430\u0441\u0442\u0440\u043e\u0438\u0442\u044c \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0435 \u043a Mattermost API, \u043a\u043e\u0442\u043e\u0440\u043e\u0435 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0438\u0440\u0443\u0435\u0442\u0441\u044f \u0432 \u0441\u0435\u043a\u0446\u0438\u0438 mattermost
\u0444\u0430\u0439\u043b\u0430 \u0441 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u0441\u043e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u043c\u0438 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u0430\u043c\u0438:
host
- \u0430\u0434\u0440\u0435\u0441 \u043f\u043e \u043a\u043e\u0442\u043e\u0440\u043e\u043c\u0443 \u0434\u043e\u0441\u0442\u0443\u043f\u0435\u043d Mattermost API.token
- \u0422\u043e\u043a\u0435\u043d \u0434\u043b\u044f \u0434\u043e\u0441\u0442\u0443\u043f\u0430 \u043a Mattermost API (\u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0435 \u0431\u043e\u0442\u043e\u0432 \u043f\u0440\u0435\u0434\u043f\u043e\u0447\u0442\u0438\u0442\u0435\u043b\u044c\u043d\u043e \u0434\u043b\u044f \u043e\u0442\u043f\u0440\u0430\u0432\u043a\u0438 \u0443\u0432\u0435\u0434\u043e\u043c\u043b\u0435\u043d\u0438\u0439).\u0415\u0441\u043b\u0438 \u0441\u0435\u043a\u0446\u0438\u044f mattermost
\u043d\u0435 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u0430, \u0442\u043e \u0443\u0432\u0435\u0434\u043e\u043c\u043b\u0435\u043d\u0438\u044f \u0432 Mattermost \u043d\u0435 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u043e\u0442\u043f\u0440\u0430\u0432\u043b\u0435\u043d\u044b. \u041f\u0440\u0438 \u044d\u0442\u043e\u043c, \u0435\u0441\u043b\u0438 \u0442\u0430\u043a\u0438\u0435 \u0443\u0432\u0435\u0434\u043e\u043c\u043b\u0435\u043d\u0438\u044f \u0431\u044b\u043b\u0438 \u0441\u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u044b \u0432 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u0445 \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430, \u0442\u043e \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0438\u0441\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f \u0431\u0443\u0434\u0435\u0442 \u0431\u0440\u043e\u0448\u0435\u043d\u0430 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0432\u0443\u044e\u0449\u0430\u044f \u043e\u0448\u0438\u0431\u043a\u0430.
\u0412 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u0445 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u0442\u0430\u043a\u0436\u0435 \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e \u0443\u043a\u0430\u0437\u0430\u0442\u044c \u043d\u0430\u0431\u043e\u0440 Spark-\u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u044f\u0432\u043b\u044f\u044e\u0442\u0441\u044f \u043e\u0431\u0449\u0438\u043c\u0438 \u0434\u043b\u044f \u0431\u043e\u043b\u044c\u0448\u0438\u043d\u0441\u0442\u0432\u0430 \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u0435\u043c\u044b\u0445 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u043e\u0432. \u0422\u0430\u043a\u0438\u0435 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u0443\u043a\u0430\u0437\u0430\u043d\u044b \u0432 \u0441\u043f\u0438\u0441\u043a\u0435 defaultSparkOptions
\u043a\u0430\u043a \u0441\u0442\u0440\u043e\u043a\u0438 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435 spark.param.name=spark.param.value
.
Hocon \u0444\u043e\u0440\u043c\u0430\u0442 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442 \u043f\u043e\u0434\u0441\u0442\u0430\u043d\u043e\u0432\u043a\u0443 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445, \u0430 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a Checkita Data Quality, \u0432 \u0441\u0432\u043e\u044e \u043e\u0447\u0435\u0440\u0435\u0434\u044c, \u0438\u043c\u0435\u0435\u0442 \u043c\u0435\u0445\u0430\u043d\u0438\u0437\u043c \u0434\u043b\u044f \u0442\u043e\u0433\u043e, \u0447\u0442\u043e\u0431\u044b \u0434\u043e\u0431\u0430\u0432\u0438\u0442\u044c \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0435 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u0432 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u043e\u043d\u043d\u044b\u0435 \u0444\u0430\u0439\u043b\u044b \u043f\u043e \u0432\u0440\u0435\u043c\u044f \u0438\u0441\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f. \u0414\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438, \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u0418\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0435 \u041f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u041e\u043a\u0440\u0443\u0436\u0435\u043d\u0438\u044f \u0438 \u0414\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u041f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445.
appConfig: {\n\n applicationName: \"Custom Data Quality Application Name\"\n\n dateTimeOptions: {\n timeZone: \"GMT+3\"\n referenceDateFormat: \"yyyy-MM-dd\"\n executionDateFormat: \"yyyy-MM-dd-HH-mm-ss\"\n }\n\n enablers: {\n allowSqlQueries: false\n allowNotifications: true\n aggregatedKafkaOutput: true\n }\n\n defaultSparkOptions: [\n \"spark.sql.orc.enabled=true\"\n \"spark.sql.parquet.compression.codec=snappy\"\n \"spark.sql.autoBroadcastJoinThreshold=-1\"\n ]\n\n storage: {\n dbType: \"postgres\"\n url: \"localhost:5432/public\"\n username: \"postgres\"\n password: \"postgres\"\n schema: \"dqdb\"\n }\n\n email: {\n host: \"smtp.some-company.domain\"\n port: \"25\"\n username: \"emailUser\"\n password: \"emailPassword\"\n address: \"some.service@some-company.domain\"\n name: \"Data Quality Service\"\n sslOnConnect: true\n }\n\n mattermost: {\n host: \"https://some-team.mattermost.com\"\n token: ${dqMattermostToken}\n }\n}\n
"},{"location":"ru/01-application-setup/02-ApplicationSubmit/","title":"\u0417\u0430\u043f\u0443\u0441\u043a \u041f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0439 Data Quality","text":"\u041f\u043e\u0441\u043a\u043e\u043b\u044c\u043a\u0443 Checkita \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a \u043f\u043e\u0441\u0442\u0440\u043e\u0435\u043d \u043d\u0430 \u043e\u0441\u043d\u043e\u0432\u0435 Spark, \u0442\u043e \u043e\u043d \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u0435\u0442\u0441\u044f \u043a\u0430\u043a \u043e\u0431\u044b\u0447\u043d\u043e\u0435 Spark-\u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044f \u043a\u043e\u043c\u0430\u043d\u0434\u0443 spark-submit
. \u041a\u0430\u043a \u0438 \u043b\u044e\u0431\u043e\u0435 \u0434\u0440\u0443\u0433\u043e\u0435 Spark-\u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435, \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 Checkita \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u0437\u0430\u043f\u0443\u0449\u0435\u043d\u043e \u043a\u0430\u043a \u043b\u043e\u043a\u0430\u043b\u044c\u043d\u043e, \u0442\u0430\u043a \u0438 \u0432 \u043a\u043b\u0430\u0441\u0442\u0435\u0440\u0435 (\u0432 \u0440\u0435\u0436\u0438\u043c\u0435 client
\u0438\u043b\u0438 cluster
).
\u041e\u0434\u043d\u0430\u043a\u043e, \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f Checkita \u0442\u0440\u0435\u0431\u0443\u044e\u0442 \u043f\u0435\u0440\u0435\u0434\u0430\u0447\u0438 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u043d\u044b\u0439 \u0430\u0440\u0433\u0443\u043c\u0435\u043d\u0442\u043e\u0432 \u043f\u0440\u0438 \u0441\u0442\u0430\u0440\u0442\u0435, \u0430 \u0438\u043c\u0435\u043d\u043d\u043e:
-a
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u041f\u0443\u0442\u044c \u0434\u043e HOCON \u0444\u0430\u0439\u043b\u0430 \u0441 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f: applicaiton.conf
. \u0421\u0442\u043e\u0438\u0442 \u043e\u0442\u043c\u0435\u0442\u0438\u0442\u044c, \u0447\u0442\u043e \u0438\u043c\u044f \u0444\u0430\u0439\u043b\u0430 \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u0434\u0440\u0443\u0433\u0438\u043c, \u043e\u0434\u043d\u0430\u043a\u043e \u043e\u0431\u044b\u0447\u043d\u043e \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0444\u0430\u0439\u043b \u0441 \u0442\u0430\u043a\u0438\u043c \u0438\u043c\u0435\u043d\u0435\u043c.-j
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0421\u043f\u0438\u0441\u043e\u043a \u043f\u0443\u0442\u0435\u0439 \u0434\u043e \u0444\u0430\u0439\u043b\u043e\u0432 \u0441 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u044f\u043c\u0438 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430. \u041f\u0443\u0442\u0438 \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u0440\u0430\u0437\u0434\u0435\u043b\u0435\u043d\u044b \u0437\u0430\u043f\u044f\u0442\u044b\u043c\u0438. HOCON \u0444\u043e\u0440\u043c\u0430\u0442 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442 \u0441\u043b\u0438\u044f\u043d\u0438\u0435 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u044f, \u043f\u043e\u044d\u0442\u043e\u043c\u0443 \u043c\u043e\u0436\u043d\u043e \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u0442\u044c \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0435 \u0447\u0430\u0441\u0442\u0438 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430 \u0432 \u043e\u0442\u0434\u0435\u043b\u044c\u043d\u044b\u0445 \u0444\u0430\u0439\u043b\u0430\u0445 \u0438 \u043f\u0435\u0440\u0435\u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c \u0438\u0445.-d
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0414\u0430\u0442\u0430 \u0437\u0430 \u043a\u043e\u0442\u043e\u0440\u0443\u044e \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u0435\u0442\u0441\u044f Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d. \u0424\u043e\u0440\u043c\u0430\u0442, \u0432 \u043a\u043e\u0442\u043e\u0440\u043e\u043c \u0443\u043a\u0430\u0437\u0430\u043d\u0430 \u0434\u0430\u0442\u0430, \u0434\u043e\u043b\u0436\u0435\u043d \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u043e\u0432\u0430\u0442\u044c \u0442\u043e\u043c\u0443, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0443\u043a\u0430\u0437\u0430\u043d \u0432 \u043f\u043e\u043b\u0435 referenceDateFormat
\u0432 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u0445 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f. \u0415\u0441\u043b\u0438 \u0434\u0430\u0442\u0430 \u043d\u0435 \u0443\u043a\u0430\u0437\u0430\u043d\u0430, \u0442\u043e \u0435\u0439 \u0431\u0443\u0434\u0435\u0442 \u043f\u0440\u0438\u0441\u0432\u043e\u0435\u043d\u043e \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435 \u0444\u0430\u043a\u0442\u0438\u0447\u0435\u0441\u043a\u043e\u0439 \u0434\u0430\u0442\u044b \u0441\u0442\u0430\u0440\u0442\u0430 \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430.-l
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0410\u0440\u0433\u0443\u043c\u0435\u043d\u0442, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u0435\u0442 \u043d\u0430 \u0442\u043e, \u0447\u0442\u043e \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u0434\u043e\u043b\u0436\u043d\u043e \u0431\u044b\u0442\u044c \u0437\u0430\u043f\u0443\u0449\u0435\u043d\u043e \u0432 \u043b\u043e\u043a\u0430\u043b\u044c\u043d\u043e\u043c \u0440\u0435\u0436\u0438\u043c\u0435.-s
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0410\u0440\u0433\u0443\u043c\u0435\u043d\u0442, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u0435\u0442 \u043d\u0430 \u0442\u043e, \u0447\u0442\u043e \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u0431\u0443\u0434\u0435\u0442 \u0437\u0430\u043f\u0443\u0449\u0435\u043d\u043e \u0441 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0435\u043c Shared Spark Context. \u0412 \u044d\u0442\u043e\u043c \u0441\u043b\u0443\u0447\u0430\u0435 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u0431\u0443\u0434\u0435\u0442 \u043f\u043e\u043b\u0443\u0447\u0430\u0442\u044c \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0443\u044e\u0449\u0438\u0439 Spark Context, \u0432\u043c\u0435\u0441\u0442\u043e \u0442\u043e\u0433\u043e, \u0447\u0442\u043e\u0431\u044b \u0441\u043e\u0437\u0434\u0430\u0432\u0430\u0442\u044c \u043d\u043e\u0432\u044b\u0439. \u0422\u0430\u043a\u0436\u0435, \u0432\u0430\u0436\u043d\u043e, \u0447\u0442\u043e\u0431\u044b \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u0432 \u044d\u0442\u043e\u043c \u0441\u043b\u0443\u0447\u0430\u0435 \u043d\u0435 \u043e\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u043b\u043e Spark Context \u043f\u043e \u0437\u0430\u0432\u0435\u0440\u0448\u0435\u043d\u0438\u0438.-m
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0410\u0440\u0433\u0443\u043c\u0435\u043d\u0442, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u0435\u0442 \u043d\u0430 \u0442\u043e, \u0447\u0442\u043e \u043c\u0438\u0433\u0440\u0430\u0446\u0438\u044f \u0431\u0430\u0437\u044b \u0434\u0430\u043d\u043d\u044b\u0445 \u0434\u043e\u043b\u0436\u043d\u0430 \u0431\u044b\u0442\u044c \u0432\u044b\u043f\u043e\u043b\u043d\u0435\u043d\u0430 \u043f\u0435\u0440\u0435\u0434 \u0441\u043e\u0445\u0440\u0430\u043d\u0435\u043d\u0438\u0435\u043c \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 (\u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u0443\u0431\u0435\u0434\u0438\u0442\u044c\u0441\u044f \u0432 \u0442\u043e\u043c, \u0447\u0442\u043e \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 \u0432 \u0430\u043a\u0442\u0443\u0430\u043b\u044c\u043d\u043e\u043c \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u0438 \u0438\u043b\u0438 \u0432\u044b\u043f\u043e\u043b\u043d\u0438\u0442\u044c \u0441\u043a\u0440\u0438\u043f\u0442\u044b \u0434\u043b\u044f \u043f\u0440\u0438\u0432\u0435\u0434\u0435\u043d\u0438\u044f \u0435\u0433\u043e \u043a \u0430\u043a\u0442\u0443\u0430\u043b\u044c\u043d\u043e\u043c\u0443 \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u044e).-e
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0424\u043b\u0430\u0433, \u0441 \u043a\u043e\u0442\u043e\u0440\u044b\u043c \u043c\u043e\u0436\u043d\u043e \u043f\u0435\u0440\u0435\u0434\u0430\u0442\u044c \u043d\u0430\u0431\u043e\u0440 \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u043f\u0440\u0438 \u0441\u0442\u0430\u0440\u0442\u0435 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f. \u0414\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0435 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u0431\u0443\u0434\u0443\u0442 \u0434\u043e\u0431\u0430\u0432\u043b\u0435\u043d\u044b \u0432 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u043e\u043d\u043d\u044b\u0435 \u0444\u0430\u0439\u043b\u044b \u0438 \u0431\u0443\u0434\u0443\u0442 \u0434\u043e\u0441\u0442\u0443\u043f\u043d\u044b \u0434\u043b\u044f \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u044f. \u041f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u043f\u0435\u0440\u0435\u0434\u0430\u044e\u0442\u0441\u044f \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435 \u043a\u043b\u044e\u0447-\u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435: \"k1=v1,k2=v2,k3=v3,...\"\"
.-v
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0410\u0440\u0433\u0443\u043c\u0435\u043d\u0442, \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e \u043a\u043e\u0442\u043e\u0440\u043e\u0433\u043e \u043c\u043e\u0436\u043d\u043e \u043d\u0430\u0437\u043d\u0430\u0447\u0438\u0442\u044c \u0443\u0440\u043e\u0432\u0435\u043d\u044c \u043b\u043e\u0433\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f \u0432 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0438. \u041f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e - INFO
.\u041f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u0434\u0432\u0430 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u0434\u043b\u044f \u0437\u0430\u043f\u0443\u0441\u043a\u0430:
ru.raiffeisen.checkita.apps.batch.DataQualityBatchApp
ru.raiffeisen.checkita.apps.stream.DataQualityStreamApp
\u041d\u0438\u0436\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d \u043f\u0440\u0438\u043c\u0435\u0440 \u0437\u0430\u043f\u0443\u0441\u043a\u0430 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f Checkita \u0432 YARN \u0432 cluster
\u0440\u0435\u0436\u0438\u043c\u0435. \u041f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0443 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0443\u043a\u0430\u0437\u0430\u043d\u044b \u0432 \u0444\u0430\u0439\u043b\u0435 application.conf
, \u043f\u0440\u0438 \u044d\u0442\u043e\u043c \u0440\u0435\u043a\u0432\u0438\u0437\u0438\u0442\u044b \u0434\u043b\u044f \u0432\u0445\u043e\u0434\u0430 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u043f\u0435\u0440\u0435\u0434\u0430\u043d\u044b \u043a\u0430\u043a \u043f\u043e\u0441\u0440\u0435\u0434\u0441\u0442\u0432\u043e\u043c \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u043e\u043a\u0440\u0443\u0436\u0435\u043d\u0438\u044f, \u0442\u0430\u043a \u0438 \u0432 \u0432\u0438\u0434\u0435 \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u043f\u0440\u0438 \u0441\u0442\u0430\u0440\u0442\u0435. \u0414\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438, \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u0418\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0435 \u041f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u041e\u043a\u0440\u0443\u0436\u0435\u043d\u0438\u044f \u0438 \u0414\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u041f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445.
export DQ_APPLICATION=\"<\u043b\u043e\u043a\u0430\u043b\u044c\u043d\u044b\u0439 \u0438\u043b\u0438 \u0443\u0434\u0430\u043b\u0435\u043d\u043d\u044b\u0439 (HDFS, S3) \u043f\u0443\u0442\u044c \u0434\u043e jar \u0441 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435\u043c>\"\nexport DQ_DEPENDENCIES=\"<\u043b\u043e\u043a\u0430\u043b\u044c\u043d\u044b\u0439 \u0438\u043b\u0438 \u0443\u0434\u0430\u043b\u0435\u043d\u043d\u044b\u0439 (HDFS, S3) \u043f\u0443\u0442\u044c \u0434\u043e uber-jar \u0441 \u0437\u0430\u0432\u0438\u0441\u0438\u043c\u043e\u0441\u0442\u044f\u043c\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f>\"\nexport DQ_APP_CONFIG=\"<\u043b\u043e\u043a\u0430\u043b\u044c\u043d\u044b\u0439 \u0438\u043b\u0438 \u0443\u0434\u0430\u043b\u0435\u043d\u043d\u044b\u0439 (HDFS, S3) \u043f\u0443\u0442\u044c \u0434\u043e \u0444\u0430\u0439\u043b\u0430 \u0441 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f>\"\nexport DQ_JOB_CONFIGS=\"<\u043b\u043e\u043a\u0430\u043b\u044c\u043d\u044b\u0435 \u0438\u043b\u0438 \u0443\u0434\u0430\u043b\u0435\u043d\u043d\u044b\u0435 (HDFS, S3) \u043f\u0443\u0442\u0438 \u0434\u043e \u0444\u0430\u0439\u043b\u043e\u0432 \u0441 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0435\u0439 \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430 (\u0440\u0430\u0437\u0434\u0435\u043b\u0435\u043d\u044b \u0437\u0430\u043f\u044f\u0442\u044b\u043c\u0438)>\"\n\n# \u041f\u043e\u0441\u043a\u043e\u043b\u044c\u043a\u0443 \u0443\u043a\u0430\u0437\u0430\u043d\u043d\u044b\u0435 \u0444\u0430\u0439\u043b\u044b \u0441\u043d\u0430\u0447\u0430\u043b\u0430 \u0431\u0443\u0434\u0443\u0442 \u0437\u0430\u0433\u0440\u0443\u0436\u0435\u043d\u044b \u043d\u0430 \u0434\u0440\u0430\u0439\u0432\u0435\u0440 \u0438 \u044d\u043a\u0437\u0435\u043a\u044c\u044e\u0442\u043e\u0440\u044b,\n# \u0442\u043e \u043e\u043d\u0438 \u0431\u0443\u0434\u0443\u0442 \u043d\u0430\u0445\u043e\u0434\u0438\u0442\u044c\u0441\u044f \u0432 \u0440\u0430\u0431\u043e\u0447\u0435\u0439 \u0434\u0438\u0440\u0435\u043a\u0442\u043e\u0440\u0438\u0438. \n# \u0422\u0430\u043a\u0438\u043c \u043e\u0431\u0440\u0430\u0437\u043e\u043c, \u0432 \u0430\u0440\u0433\u0443\u043c\u0435\u043d\u0442\u0430\u0445 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u043d\u0443\u0436\u043d\u043e \u0443\u043a\u0430\u0437\u0430\u0442\u044c \u0442\u043e\u043b\u044c\u043a\u043e \u043b\u0438\u0448\u044c \u0438\u043c\u0435\u043d\u0430 \u0444\u0430\u0439\u043b\u043e\u0432:\nexport DQ_APP_CONFIG_FILE=$(basename $DQ_APP_CONFIG)\nexport DQ_JOB_CONFIG_FILES=\"<job configuration files separated by commas (only file names)>\"\nexport REFERENCE_DATE=\"2023-08-01\"\n\n# \u0412\u0445\u043e\u0434\u043d\u0430\u044f \u0442\u043e\u0447\u043a\u0430 \u0434\u043b\u044f \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f (executable class): ru.raiffeisen.checkita.apps.batch.DataQualityBatchApp\n# \u041f\u0430\u0440\u0430\u043c\u0435\u0442\u0440 --name \u0432 spark-submit \u043a\u043e\u043c\u0430\u043d\u0434\u0435 \u0438\u043c\u0435\u0435\u0442 \u0431\u043e\u043b\u0435\u0435 \u0432\u044b\u0441\u043e\u043a\u0438\u0439 \u043f\u0440\u0438\u043e\u0440\u0438\u0442\u0435\u0442, \u0447\u0435\u043c\n# \u0438\u043c\u044f \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u0443\u043a\u0430\u0437\u0430\u043d\u043d\u043e\u0435 \u0432 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u0445 `application.conf`.\n\nspark-submit\\\n --class ru.raiffeisen.checkita.apps.batch.DataQualityBatchApp \\\n --name \"Checkita Data Quality\" \\\n --master yarn \\\n --deploy-mode cluster \\\n --num-executors 1 \\\n --executor-memory 2g \\\n --executor-cores 4 \\\n --driver-memory 2g \\\n --jars $DQ_DEPENDENCIES \\\n --files \"$DQ_APP_CONFIG,$DQ_DQ_JOB_CONFIGS\" \\\n --conf \"spark.executor.memoryOverhead=2g\" \\\n --conf \"spark.driver.memoryOverhead=2g\" \\\n --conf \"spark.driver.maxResultSize=4g\" \\\n $DQ_APPLICATION \\\n -a $DQ_APP_CONFIG_FILE \\\n -j $DQ_JOB_CONFIG_FILES \\\n -d $REFERENCE_DATE \\\n -e \"storage_db_user=some_db_user,storage_db_password=some_db_password\"\n
"},{"location":"ru/01-application-setup/03-ResultsStorage/","title":"\u0425\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0420\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432","text":"\u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c \u0432\u0441\u0435 \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u0438 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0441\u043e\u0437\u0434\u0430\u0442\u044c \u0438 \u043d\u0430\u0441\u0442\u0440\u043e\u0438\u0442\u044c \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432. Checkita \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a \u043c\u043e\u0436\u0435\u0442 \u0440\u0430\u0431\u043e\u0442\u0430\u0442\u044c \u0441 \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u043c\u0438 RDBMS \u0434\u043b\u044f \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432. \u041f\u043e\u043c\u0438\u043c\u043e \u044d\u0442\u043e\u0433\u043e, Hive \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d \u043a\u0430\u043a \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432, \u0442\u0430\u043a \u0436\u0435 \u043a\u0430\u043a \u0438 \u043e\u0431\u044b\u0447\u043d\u043e\u0435 \u0444\u0430\u0439\u043b\u043e\u0432\u043e\u0435 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435.
\u041f\u043e\u043b\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0445 \u0442\u0438\u043f\u043e\u0432 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0434\u0430\u043d \u043d\u0438\u0436\u0435:
PostgreSQL
(v.9.3 \u0438 \u0432\u044b\u0448\u0435) - \u0440\u0435\u043a\u043e\u043c\u0435\u043d\u0434\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u044f \u0432 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0435 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432.Oracle
MySQL
Microsoft SQL Server
SQLite
H2
Hive
File
(\u0434\u0438\u0440\u0435\u043a\u0442\u043e\u0440\u0438\u044f \u0432 \u043b\u043e\u043a\u0430\u043b\u044c\u043d\u043e\u0439 \u0444\u0430\u0439\u043b\u043e\u0432\u043e\u0439 \u0441\u0438\u0441\u0442\u0435\u043c\u0435 \u0438\u043b\u0438 \u0443\u0434\u0430\u043b\u0435\u043d\u043d\u043e\u0439 (HDFS, S3))Checkita \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442 \u044d\u0432\u043e\u043b\u044e\u0446\u0438\u044e \u0441\u0445\u0435\u043c\u044b \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432. \u0414\u043b\u044f \u0432\u044b\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f \u043c\u0438\u0433\u0440\u0430\u0446\u0438\u0439 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f Flyway. \u0422\u0430\u043a\u0438\u043c \u043e\u0431\u0440\u0430\u0437\u043e\u043c, \u0435\u0441\u043b\u0438 \u0432 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0435 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0432\u044b\u0431\u0440\u0430\u043d\u0430 \u043e\u0434\u043d\u0430 \u0438\u0437 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u043c\u044b\u0445 RDBMS, \u0442\u043e \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e \u043f\u0440\u043e\u0432\u0435\u0441\u0442\u0438 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0443 \u0435\u0433\u043e \u0441\u0445\u0435\u043c\u044b \u043f\u0440\u0438 \u043f\u0435\u0440\u0432\u043e\u043c \u0437\u0430\u043f\u0443\u0441\u043a\u0435 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430, \u0443\u043a\u0430\u0437\u0430\u0432 \u0430\u0440\u0433\u0443\u043c\u0435\u043d\u0442 -m
\u043f\u0440\u0438 \u0441\u0442\u0430\u0440\u0442\u0435. \u041f\u043e\u0434\u0440\u043e\u0431\u043d\u0435\u0435 \u043e \u0442\u043e\u043c, \u043a\u0430\u043a \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u0442\u044c \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f Checkita, \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u0417\u0430\u043f\u0443\u0441\u043a \u041f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0439 Data Quality.
\u0412\u0410\u0416\u041d\u041e: \u041c\u0438\u0433\u0440\u0430\u0446\u0438\u0438 Flyway \u043e\u0431\u044b\u0447\u043d\u043e \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u044e\u0442\u0441\u044f \u043b\u0438\u0431\u043e \u0432 \u043f\u0443\u0441\u0442\u043e\u0439 \u0431\u0430\u0437\u0435/\u0441\u0445\u0435\u043c\u0435, \u043b\u0438\u0431\u043e \u0432 \u0442\u043e\u0439, \u043a\u043e\u0442\u043e\u0440\u0430\u044f \u0443\u0436\u0435 \u0431\u044b\u043b\u0430 \u043f\u0440\u043e\u0438\u043d\u0438\u0446\u0438\u0430\u043b\u0438\u0437\u0438\u0440\u043e\u0432\u0430\u043d\u0430 \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e Flyway. \u0412 Checkita \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0435 \u0442\u0430\u043a\u0436\u0435 \u043c\u043e\u0436\u043d\u043e \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u0442\u044c \u043c\u0438\u0433\u0440\u0430\u0446\u0438\u0438 \u0432 \u043d\u0435\u043f\u0443\u0441\u0442\u043e\u0439 \u0431\u0430\u0437\u0435/\u0441\u0445\u0435\u043c\u0435. \u0412 \u044d\u0442\u043e\u043c \u0441\u043b\u0443\u0447\u0430\u0435, \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044e \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0443\u0431\u0435\u0434\u0438\u0442\u044c\u0441\u044f, \u0447\u0442\u043e \u0432 \u0431\u0430\u0437\u0435/\u0441\u0445\u0435\u043c\u0435 \u043d\u0435\u0442 \u043a\u043e\u043d\u0444\u043b\u0438\u043a\u0442\u0443\u044e\u0449\u0438\u0445 \u0438\u043c\u0435\u043d \u0442\u0430\u0431\u043b\u0438\u0446.
\u0415\u0441\u043b\u0438 \u0432\u044b\u0431\u0440\u0430\u043d File
\u0442\u0438\u043f \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432, \u0442\u043e \u0434\u043e\u0441\u0442\u0430\u0442\u043e\u0447\u043d\u043e \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u0438\u0442\u044c \u043f\u0443\u0442\u044c \u0434\u043e \u0434\u0438\u0440\u0435\u043a\u0442\u043e\u0440\u0438\u0438/\u0431\u0430\u043a\u0435\u0442\u0430, \u0433\u0434\u0435 \u0431\u0443\u0434\u0443\u0442 \u0445\u0440\u0430\u043d\u0438\u0442\u044c\u0441\u044f \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b. \u0420\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b \u0441\u043e\u0445\u0440\u0430\u043d\u044f\u044e\u0442\u0441\u044f \u043a\u0430\u043a .parquet
\u0444\u0430\u0439\u043b\u044b \u0441 \u0442\u0430\u043a\u043e\u0439 \u0436\u0435 \u0441\u0445\u0435\u043c\u043e\u0439, \u043a\u0430\u043a \u0438 \u0432 \u0441\u043b\u0443\u0447\u0430\u0435 \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f \u0438\u0445 \u0432 RDBMS. \u0414\u043b\u044f \u0444\u0430\u0439\u043b\u043e\u0432\u043e\u0433\u043e \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u043d\u0435 \u043f\u0440\u0435\u0434\u0443\u0441\u043c\u043e\u0442\u0440\u0435\u043d\u044b \u043c\u0435\u0445\u0430\u043d\u0438\u0437\u043c\u044b \u044d\u0432\u043e\u043b\u044e\u0446\u0438\u0438 \u0441\u0445\u0435\u043c\u044b. \u041f\u043e\u044d\u0442\u043e\u043c\u0443, \u0435\u0441\u043b\u0438 \u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u0430 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0438\u0437\u043c\u0435\u043d\u0438\u0442\u0441\u044f \u0432 \u0431\u0443\u0434\u0443\u0449\u0435\u043c, \u0442\u043e \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044e \u0431\u0443\u0434\u0435\u0442 \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0441\u0430\u043c\u043e\u0441\u0442\u043e\u044f\u0442\u0435\u043b\u044c\u043d\u043e \u043e\u0431\u043d\u043e\u0432\u0438\u0442\u044c \u0441\u0445\u0435\u043c\u044b \u0432 \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0443\u044e\u0449\u0438\u0445 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u0430\u0445.
\u0412\u0410\u0416\u041d\u041e: \u041f\u0440\u0438 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0438 \u0444\u0430\u0439\u043b\u043e\u0432\u043e\u0433\u043e \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430, \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b \u043d\u0435 \u043f\u0430\u0440\u0442\u0438\u0446\u0438\u043e\u043d\u0438\u0440\u0443\u044e\u0442\u0441\u044f \u043d\u0438 \u043f\u043e \u043e\u0434\u043d\u043e\u043c\u0443 \u0438\u0437 \u043f\u043e\u043b\u0435\u0439. \u0422\u0430\u043a\u0438\u043c \u043e\u0431\u0440\u0430\u0437\u043e\u043c, \u043a\u0430\u0436\u0434\u044b\u0439 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d \u043f\u0440\u0438 \u0441\u043e\u0445\u0440\u0430\u043d\u0435\u043d\u0438\u0438 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0431\u0443\u0434\u0435\u0442 \u0447\u0438\u0442\u0430\u0442\u044c \u0444\u0430\u0439\u043b\u044b \u0446\u0435\u043b\u0438\u043a\u043e\u043c \u0438 \u0438\u0445 \u043f\u0435\u0440\u0435\u0437\u0430\u043f\u0438\u0441\u044b\u0432\u0430\u0442\u044c. \u0412\u0432\u0438\u0434\u0443 \u044d\u0442\u0438\u0445 \u043e\u0441\u043e\u0431\u0435\u043d\u043d\u043e\u0441\u0442\u0435\u0439, \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0435 \u044d\u0442\u043e\u0433\u043e \u0442\u0438\u043f\u0430 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0432 \u043f\u0440\u043e\u0434\u0443\u043a\u0442\u043e\u0432\u043e\u0439 \u0441\u0440\u0435\u0434\u0435 \u043d\u0435 \u0440\u0435\u043a\u043e\u043c\u0435\u043d\u0434\u0443\u0435\u0442\u0441\u044f.
\u0414\u043b\u044f Hive
\u0442\u0438\u043f\u0430 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432, \u043c\u0435\u0445\u0430\u043d\u0438\u0437\u043c\u044b \u044d\u0432\u043e\u043b\u044e\u0446\u0438\u0438 \u0441\u0445\u0435\u043c\u044b \u0442\u0430\u043a\u0436\u0435 \u043d\u0435\u0434\u043e\u0441\u0442\u0443\u043f\u043d\u044b. \u041f\u043e\u044d\u0442\u043e\u043c\u0443 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044e \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0441\u0430\u043c\u043e\u0441\u0442\u043e\u044f\u0442\u0435\u043b\u044c\u043d\u043e \u0441\u043e\u0437\u0434\u0430\u0442\u044c \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u044b\u0435 Hive-\u0442\u0430\u0431\u043b\u0438\u0446\u044b. DDL \u0441\u043a\u0440\u0438\u043f\u0442\u044b \u0438\u0437 \u0433\u043b\u0430\u0432\u044b \u0421\u043a\u0440\u0438\u043f\u0442\u044b \u0434\u043b\u044f \u041d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0438 \u0425\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0420\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0432 Hive.
\u0412\u0410\u0416\u041d\u041e: \u0420\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u043f\u043e\u0437\u0438\u0446\u0438\u043e\u043d\u0438\u0440\u043e\u0432\u0430\u043d\u044b \u043f\u043e job_id
. \u0418\u0434\u0435\u043d\u0442\u0438\u0444\u0438\u043a\u0430\u0442\u043e\u0440 \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430 \u0432\u044b\u0431\u0440\u0430\u043d \u043a\u0430\u043a \u043a\u043e\u043b\u043e\u043d\u043a\u0430 \u043f\u043e\u0437\u0438\u0446\u0438\u043e\u043d\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f \u0434\u043b\u044f \u0442\u043e\u0433\u043e, \u0447\u0442\u043e\u0431\u044b \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0438\u0442\u044c \u0431\u043e\u043b\u0435\u0435 \u0431\u044b\u0441\u0442\u0440\u043e\u0435 \u043f\u043e\u043b\u0443\u0447\u0435\u043d\u0438\u0435 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0440\u0430\u0441\u0447\u0435\u0442\u043e\u0432 \u0442\u0440\u0435\u043d\u0434\u043e\u0432\u044b\u0445 \u043f\u0440\u043e\u0432\u0435\u0440\u043e\u043a (\u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0442\u0441\u044f \u0434\u043b\u044f \u043e\u0431\u043d\u0430\u0440\u0443\u0436\u0435\u043d\u0438\u044f \u0430\u043d\u043e\u043c\u0430\u043b\u0438\u0439 \u0432 \u0434\u0430\u043d\u043d\u044b\u0445). Hive
\u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0440\u0430\u0431\u043e\u0442\u0430\u0435\u0442 \u0431\u044b\u0441\u0442\u0440\u0435\u0435, \u0447\u0435\u043c File
\u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435, \u0442.\u043a. \u0442\u043e\u043b\u044c\u043a\u043e \u043f\u0430\u0440\u0442\u0438\u0446\u0438\u044f, \u043a\u043e\u0442\u043e\u0440\u0430\u044f \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0443\u0435\u0442 \u0438\u0434\u0435\u043d\u0442\u0438\u0444\u0438\u043a\u0430\u0442\u043e\u0440\u0443 \u0442\u0435\u043a\u0443\u0449\u0435\u0433\u043e \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430 \u0447\u0438\u0442\u0430\u0435\u0442\u0441\u044f \u0438 \u043f\u0435\u0440\u0435\u0437\u0430\u043f\u0438\u0441\u044b\u0432\u0430\u0435\u0442\u0441\u044f. \u0422\u0435\u043c \u043d\u0435 \u043c\u0435\u043d\u0435\u0435, \u044d\u0442\u043e\u0442 \u0442\u0438\u043f \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0442\u0430\u043a\u0436\u0435 \u043d\u0435 \u0440\u0435\u043a\u043e\u043c\u0435\u043d\u0434\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u044f \u0432 \u043f\u0440\u043e\u0434\u0443\u043a\u0442\u043e\u0432\u044b\u0445 \u0441\u0440\u0435\u0434\u0430\u0445, \u0433\u0434\u0435 \u0431\u0443\u0434\u0435\u0442 \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u0442\u044c\u0441\u044f \u0431\u043e\u043b\u044c\u0448\u043e\u0435 \u043a\u043e\u043b\u0438\u0447\u0435\u0441\u0442\u0432\u043e \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u043e\u0432.
Checkita \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a \u0437\u0430\u043f\u0438\u0441\u044b\u0432\u0430\u0435\u0442 \u0447\u0435\u0442\u044b\u0440\u0435 \u0442\u0438\u043f\u0430 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432:
\u0421\u0445\u0435\u043c\u044b \u0434\u043b\u044f \u0432\u0441\u0435\u0445 \u0442\u0438\u043f\u043e\u0432 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u044b \u043d\u0438\u0436\u0435:
"},{"location":"ru/01-application-setup/03-ResultsStorage/#regular-metrics-results-schema","title":"Regular Metrics Results Schema","text":"(job_id, metric_id, reference_date)
source_id
& column_names
\u0441\u043e\u0434\u0435\u0440\u0436\u0430\u0442 \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u0435 \u0441\u043f\u0438\u0441\u043a\u043e\u0432 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: '[val1,val2,val3]'
.params
\u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u044f\u0435\u0442 \u0441\u043e\u0431\u043e\u0439 JSON \u0441\u0442\u0440\u043e\u043a\u0443.(job_id, metric_id, reference_date)
source_id
\u0441\u043e\u0434\u0435\u0440\u0436\u0438\u0442 \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u0435 \u0441\u043f\u0438\u0441\u043a\u0430 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: '[val1,val2,val3]'
.(job_id, check_id, reference_date)
source_id
\u0441\u043e\u0434\u0435\u0440\u0436\u0438\u0442 \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u0435 \u0441\u043f\u0438\u0441\u043a\u0430 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: '[val1,val2,val3]'
.(job_id, check_id, reference_date)
source_id
\u0441\u043e\u0434\u0435\u0440\u0436\u0438\u0442 \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u0435 \u0441\u043f\u0438\u0441\u043a\u0430 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: '[val1,val2,val3]'
.(job_id, reference_date)
;version_info
- \u0441\u0442\u0440\u043e\u043a\u0430 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435 JSON;config
- \u0441\u0442\u0440\u043e\u043a\u0430 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435 JSON.\u041d\u0438\u0436\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u044b HiveQL \u0441\u043a\u0440\u0438\u043f\u0442\u044b, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u044b \u0434\u043b\u044f \u0438\u043d\u0438\u0446\u0438\u0430\u043b\u0438\u0437\u0430\u0446\u0438\u0438 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0432 Hive:
-- \u041d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0437\u0430\u043c\u0435\u043d\u0438\u0442\u044c <schema_name> \u0438 <schema_dir> \u043d\u0430 \u0444\u0430\u043a\u0442\u0438\u0447\u0435\u0441\u043a\u0438\u0435 \u0438\u043c\u044f \u0441\u0445\u0435\u043c\u044b \u0438 \u043f\u0443\u0442\u044c \u0434\u043e \u043d\u0435\u0435.\nset hivevar:schema_name=<schema_name>;\nset hivevar:schema_dir=<schema_path>;\n\nCREATE SCHEMA IF NOT EXISTS ${schema_name};\n\nDROP TABLE IF EXISTS ${schema_name}.results_metric_regular;\nCREATE EXTERNAL TABLE ${schema_name}.results_metric_regular\n(\n job_id STRING COMMENT '',\n metric_id STRING COMMENT '',\n metric_name STRING COMMENT '',\n description STRING COMMENT '',\n metadata STRING COMMENT '',\n source_id STRING COMMENT '',\n column_names STRING COMMENT '',\n params STRING COMMENT '',\n result DOUBLE COMMENT '',\n additional_result STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT ''\n)\nCOMMENT 'Data Quality Regular Metrics Results'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/results_metric_regular';\n\nDROP TABLE IF EXISTS ${schema_name}.results_metric_composed;\nCREATE EXTERNAL TABLE ${schema_name}.results_metric_composed\n(\n job_id STRING COMMENT '',\n metric_id STRING COMMENT '',\n metric_name STRING COMMENT '',\n description STRING COMMENT '',\n metadata STRING COMMENT '',\n source_id STRING COMMENT '',\n formula STRING COMMENT '',\n result DOUBLE COMMENT '',\n additional_result STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT ''\n)\nCOMMENT 'Data Quality Composed Metrics Results'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/results_metric_composed';\n\nDROP TABLE IF EXISTS ${schema_name}.results_check_load;\nCREATE EXTERNAL TABLE ${schema_name}.results_check_load\n(\n job_id STRING COMMENT '',\n check_id STRING COMMENT '',\n check_name STRING COMMENT '',\n description STRING COMMENT '',\n metadata STRING COMMENT '',\n source_id STRING COMMENT '',\n expected STRING COMMENT '',\n status STRING COMMENT '',\n message STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT ''\n)\nCOMMENT 'Data Quality Load Checks Results'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/results_check_load';\n\nDROP TABLE IF EXISTS ${schema_name}.results_check;\nCREATE EXTERNAL TABLE ${schema_name}.results_check\n(\n job_id STRING COMMENT '',\n check_id STRING COMMENT '',\n check_name STRING COMMENT '',\n description STRING COMMENT '',\n metadata STRING COMMENT '',\n source_id STRING COMMENT '',\n base_metric STRING COMMENT '',\n compared_metric STRING COMMENT '',\n compared_threshold DOUBLE COMMENT '',\n lower_bound DOUBLE COMMENT '',\n upper_bound DOUBLE COMMENT '',\n status STRING COMMENT '',\n message STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT ''\n)\nCOMMENT 'Data Quality Checks Results'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/results_check';\n\nDROP TABLE IF EXISTS ${schema_name}.job_state;\nCREATE EXTERNAL TABLE ${schema_name}.job_state\n(\n job_id STRING COMMENT '',\n config STRING COMMENT '',\n version_info STRING COMMENT '',\n reference_date TIMESTAMP COMMENT '',\n execution_date TIMESTAMP COMMENT ''\n)\nCOMMENT 'Data Quality Job State'\nPARTITIONED BY (job_id STRING)\nSTORED AS PARQUET\nLOCATION '${schema_dir}/job_state';\n
"},{"location":"ru/02-general-concepts/","title":"\u041e\u0441\u043d\u043e\u0432\u043d\u044b\u0435 \u043a\u043e\u043d\u0446\u0435\u043f\u0442\u044b","text":"\u0412 \u0434\u0430\u043d\u043d\u043e\u043c \u0440\u0430\u0437\u0434\u0435\u043b\u0435 \u043e\u0431\u044a\u044f\u0441\u043d\u044f\u044e\u0442\u0441\u044f \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0435 \u0430\u0441\u043f\u0435\u043a\u0442\u044b \u0440\u0430\u0431\u043e\u0442\u044b \u0441 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u043e\u043c Checkita Data Quality.
"},{"location":"ru/02-general-concepts/01-WorkingWithDateTime/","title":"\u0420\u0430\u0431\u043e\u0442\u0430 \u0441 \u0414\u0430\u0442\u0430\u043c\u0438","text":"\u0417\u0434\u0435\u0441\u044c \u0438 \u0434\u0430\u043b\u0435\u0435 \u043f\u043e\u0434 \u0434\u0430\u0442\u043e\u0439 \u043f\u043e\u043d\u0438\u043c\u0430\u0435\u0442\u0441\u044f DateTime \u043e\u0431\u044a\u0435\u043a\u0442, \u0445\u0440\u0430\u043d\u044f\u0449\u0438\u0439 \u043a\u0430\u043a \u0434\u0430\u0442\u0443, \u0442\u0430\u043a \u0438 \u0432\u0440\u0435\u043c\u044f
\u0412 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0435 Checkita \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0443\u0435\u0442 \u0434\u0432\u0430 \u043e\u0441\u043d\u043e\u0432\u043d\u044b\u0445 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u0434\u0430\u0442\u044b, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0442\u0441\u044f \u0434\u043b\u044f \u0438\u0434\u0435\u043d\u0442\u0438\u0444\u0438\u043a\u0430\u0446\u0438\u0438 \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0445 \u0437\u0430\u043f\u0443\u0441\u043a\u043e\u0432 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u043e\u0432:
referenceDate
- \u0434\u0430\u0442\u0430, \u043a\u043e\u0442\u043e\u0440\u0430\u044f \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u0435\u0442 \u043d\u0430 \u0442\u043e, \u0437\u0430 \u043a\u0430\u043a\u043e\u0439 \u043f\u0435\u0440\u0438\u043e\u0434 \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u0435\u0442\u0441\u044f \u0438 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0435\u0442\u0441\u044f \u0443\u043a\u0430\u0437\u0430\u043d\u043d\u044b\u0439 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d.executionDate
- \u0434\u0430\u0442\u0430, \u043a\u043e\u0442\u043e\u0440\u0430\u044f \u0445\u0440\u0430\u043d\u0438\u0442 \u0444\u0430\u043a\u0442\u0438\u0447\u0435\u0441\u043a\u043e\u0435 \u0432\u0440\u0435\u043c\u044f \u0437\u0430\u043f\u0443\u0441\u043a\u0430 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430.\u0422\u0438\u043f\u043e\u0432\u043e\u0439 \u043f\u0440\u0438\u043c\u0435\u0440: \u043c\u044b \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u0435\u043c \u043a\u0430\u043a\u043e\u0439-\u043b\u0438\u0431\u043e ETL \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d (\u0442\u0430\u043a\u0436\u0435 \u0441\u043e\u0434\u0435\u0440\u0436\u0438\u0442 \u0437\u0430\u0434\u0430\u0447\u0443 \u043f\u043e \u0440\u0430\u0441\u0447\u0435\u0442\u0443 \u043c\u0435\u0442\u0440\u0438\u043a \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0430 \u0434\u0430\u043d\u043d\u044b\u0445) \u043f\u043e\u0441\u043b\u0435 \u0437\u0430\u043a\u0440\u044b\u0442\u0438\u044f \u0431\u0438\u0437\u043d\u0435\u0441 \u0434\u043d\u044f, \u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440 \u0432 \u043f\u043e\u043b\u043d\u043e\u0447\u044c. \u0422\u0430\u043a\u0438\u043c \u043e\u0431\u0440\u0430\u0437\u043e\u043c, referenceDate
\u0431\u0443\u0434\u0435\u0442 \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u0442\u044c \u043d\u0430 \u043f\u0440\u0435\u0434\u044b\u0434\u0443\u0449\u0438\u0439 \u0434\u0435\u043d\u044c, \u0437\u0430 \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u043c\u044b \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0435\u043c ETL, \u0430 executionDate
\u0431\u0443\u0434\u0435\u0442 \u0445\u0440\u0430\u043d\u0438\u0442\u044c \u0442\u0435\u043a\u0443\u0449\u0443\u044e \u0434\u0430\u0442\u0443 - \u0434\u0430\u0442\u0443 \u0444\u0430\u043a\u0442\u0438\u0447\u0435\u0441\u043a\u043e\u0433\u043e \u0437\u0430\u043f\u0443\u0441\u043a\u0430 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430. \u0412\u0435\u0440\u043e\u044f\u0442\u043d\u043e, \u0443 \u043d\u0430\u0441 \u043f\u043e\u044f\u0432\u0438\u0442\u0441\u044f \u043f\u043e\u0442\u0440\u0435\u0431\u043d\u043e\u0441\u0442\u044c \u0432 \u0442\u043e\u043c, \u0447\u0442\u043e\u0431\u044b \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u044b\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u044d\u0442\u0438\u0445 \u0434\u0430\u0442 \u043e\u0442\u043b\u0438\u0447\u0430\u043b\u0438\u0441\u044c. \u0418 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a Checkita \u0434\u0430\u0435\u0442 \u043d\u0430\u043c \u0442\u0430\u043a\u0443\u044e \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c, \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u044f \u043d\u0430\u0441\u0442\u0440\u0430\u0438\u0432\u0430\u0442\u044c \u0438\u043d\u0434\u0438\u0432\u0438\u0434\u0443\u0430\u043b\u044c\u043d\u044b\u0435 \u0444\u043e\u0440\u043c\u0430\u0442\u044b \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u0434\u043b\u044f \u043a\u0430\u0436\u0434\u043e\u0439 \u0438\u0437 \u044d\u0442\u0438\u0445 \u0434\u0430\u0442 \u0432 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u0445 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f.
\u0422\u0430\u043a \u043a\u0430\u043a referenceDate
\u043c\u043e\u0436\u0435\u0442 \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u0442\u044c \u043d\u0430 \u043f\u0440\u043e\u0448\u0435\u0434\u0448\u0438\u0435 \u0434\u0430\u0442\u044b, \u0442\u043e \u0435\u0435 \u043c\u043e\u0436\u043d\u043e \u044f\u0432\u043d\u043e \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u0442\u044c \u0432 \u043f\u0440\u0438 \u0441\u0442\u0430\u0440\u0442\u0435 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f. \u0415\u0441\u043b\u0438 \u044d\u0442\u0430 \u0434\u0430\u0442\u0430 \u043d\u0435 \u0443\u043a\u0430\u0437\u0430\u043d\u0430 \u044f\u0432\u043d\u043e, \u0442\u043e \u043e\u043d\u0430 \u0431\u0443\u0434\u0435\u0442 \u0442\u0430\u043a\u0436\u0435 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u043e\u0432\u0430\u0442\u044c \u0444\u0430\u043a\u0442\u0438\u0447\u0435\u0441\u043a\u043e\u0439 \u0434\u0430\u0442\u0435 \u0441\u0442\u0430\u0440\u0442\u0430 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f. \u0421\u043c. \u0433\u043b\u0430\u0432\u0443 \u0417\u0430\u043f\u0443\u0441\u043a \u041f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0439 Data Quality \u0434\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043e\u0431 \u0430\u0440\u0433\u0443\u043c\u0435\u043d\u0442\u0430\u0445, \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u043c\u044b\u0445 \u043f\u0440\u0438 \u0437\u0430\u043f\u0443\u0441\u043a\u0435 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0439 Data Quality.
\u041e\u0431\u0435 \u044d\u0442\u0438 \u0434\u0430\u0442\u044b \u0448\u0438\u0440\u043e\u043a\u043e \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0442\u0441\u044f \u0432\u043d\u0443\u0442\u0440\u0438 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430. \u041f\u043e\u044d\u0442\u043e\u043c\u0443, \u0432 \u043b\u044e\u0431\u044b\u0445 \u0441\u0438\u0442\u0443\u0430\u0446\u0438\u044f\u0445, \u043a\u043e\u0433\u0434\u0430 \u043d\u0430\u043c \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f \u0438\u0445 \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u0435, \u043e\u043d\u043e \u043f\u043e\u043b\u0443\u0447\u0430\u0435\u0442\u0441\u044f \u0432 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0438\u0438 \u0441 \u0442\u0435\u043c \u0444\u043e\u0440\u043c\u0430\u0442\u043e\u043c, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0443\u043a\u0430\u0437\u0430\u043d \u0432 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u0445 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f.
\u0422\u0430\u043a\u0436\u0435 \u043d\u0443\u0436\u043d\u043e \u043e\u0442\u043c\u0435\u0442\u0438\u0442\u044c, \u0447\u0442\u043e \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u0435 \u0444\u043e\u0440\u043c\u0438\u0440\u0443\u0435\u0442\u0441\u044f \u0441 \u0443\u0447\u0435\u0442\u043e\u043c \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0439 \u0437\u043e\u043d\u044b, \u0432 \u043a\u043e\u0442\u043e\u0440\u043e\u0439 \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u0435\u0442\u0441\u044f \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435. \u0412\u0440\u0435\u043c\u0435\u043d\u043d\u0430 \u0437\u043e\u043d\u0430 \u0442\u0430\u043a\u0436\u0435 \u0437\u0430\u0434\u0430\u0435\u0442\u0441\u044f \u0432 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u0445 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f. \u041f\u043e \u0443\u043c\u043e\u043b\u0447\u0430\u043d\u0438\u044e \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0437\u043e\u043d\u0430 UTC
.
\u0412 \u0434\u043e\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u0435, \u043d\u043e \u043d\u0435 \u043c\u0435\u043d\u0435\u0435 \u0432\u0430\u0436\u043d\u043e: \u043c\u044b \u0441\u043e\u0437\u043d\u0430\u0442\u0435\u043b\u044c\u043d\u043e \u0438\u0437\u0431\u0435\u0433\u0430\u0435\u043c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u044f \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u043e\u0433\u043e \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u0434\u0430\u0442 \u043f\u0440\u0438 \u0441\u043e\u0445\u0440\u0430\u043d\u0435\u043d\u0438\u0438 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0432 \u0431\u0430\u0437\u0443 \u0434\u0430\u043d\u043d\u044b\u0445. \u0412 \u044d\u0442\u043e\u043c \u0441\u043b\u0443\u0447\u0430\u0435 \u0434\u0430\u0442\u044b \u043a\u043e\u043d\u0432\u0435\u0440\u0442\u0438\u0440\u0443\u044e\u0442\u0441\u044f \u0432 \u0442\u0438\u043f Timestamp \u0438 \u043f\u0440\u0438\u0432\u043e\u0434\u044f\u0442\u0441\u044f \u043a \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0439 \u0437\u043e\u043d\u0435 UTC
. \u0422\u0430\u043a\u043e\u0439 \u043f\u043e\u0434\u0445\u043e\u0434 \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u043d\u0430\u0434\u0435\u0436\u043d\u043e \u0441\u0442\u0440\u043e\u0438\u0442\u044c \u0437\u0430\u043f\u0440\u043e\u0441\u044b \u043a \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0443 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043d\u0435 \u0431\u0443\u0434\u0443\u0442 \u0437\u0430\u0432\u0438\u0441\u0435\u0442\u044c \u043e\u0442 \u043d\u0430\u0441\u0442\u0440\u043e\u0435\u043a \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u0434\u0430\u0442. \u0421\u043c. \u0433\u043b\u0430\u0432\u0443 \u0425\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0420\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0434\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043e \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432.
\u0412\u0410\u0416\u041d\u041e: \u0424\u0430\u043a\u0442\u0438\u0447\u0435\u0441\u043a\u0438\u0435 \u0441\u0442\u0440\u043e\u043a\u043e\u0432\u044b\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f referenceDate
\u0438 exectionDate
\u0432\u0441\u0435\u0433\u0434\u0430 \u0434\u043e\u0431\u0430\u0432\u043b\u044f\u044e\u0442\u0441\u044f \u0432 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u043e\u043d\u043d\u044b\u0435 \u0444\u0430\u0439\u043b\u044b \u0432 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0435 \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b. \u0414\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043e\u0431 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0438 \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445, \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u0418\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0435 \u041f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u041e\u043a\u0440\u0443\u0436\u0435\u043d\u0438\u044f \u0438 \u0414\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u041f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445.
Hocon \u0444\u043e\u0440\u043c\u0430\u0442 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u043f\u043e\u0434\u0441\u0442\u0430\u043d\u043e\u0432\u043a\u0443 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445. \u042d\u0442\u043e\u0442 \u043c\u0435\u0445\u0430\u043d\u0438\u0437\u043c \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u0431\u043e\u043b\u0435\u0435 \u0433\u0438\u0431\u043a\u043e \u0443\u043f\u0440\u0430\u0432\u043b\u044f\u0442\u044c \u043a\u0430\u043a \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f, \u0442\u0430\u043a \u0438 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u044f\u043c\u0438 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u043e\u0432.
\u0422\u0430\u043a, \u043a \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u043e\u043d\u043d\u044b\u043c \u0444\u0430\u0439\u043b\u0430\u043c \u0434\u043e\u0431\u0430\u0432\u043b\u044f\u044e\u0442\u0441\u044f \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0435 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0441\u0447\u0438\u0442\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u0438\u0437 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u043e\u043a\u0440\u0443\u0436\u0435\u043d\u0438\u044f \u0438\u043b\u0438 \u0436\u0435 \u0437\u0430\u0434\u0430\u044e\u0442\u0441\u044f \u0432 \u044f\u0432\u043d\u043e \u0432\u0438\u0434\u0435 \u043f\u0440\u0438 \u0441\u0442\u0430\u0440\u0442\u0435 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f.
\u0414\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043e \u0437\u0430\u0434\u0430\u043d\u0438\u0438 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u0432 \u044f\u0432\u043d\u043e\u043c \u0432\u0438\u0434\u0435 \u043f\u0440\u0438 \u0441\u0442\u0430\u0440\u0442\u0435 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f, \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u0417\u0430\u043f\u0443\u0441\u043a \u041f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0439 Data Quality.
\u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u043e\u043a\u0440\u0443\u0436\u0435\u043d\u0438\u0435 (\u043c\u043e\u0436\u043d\u043e \u0442\u0430\u043a\u0436\u0435 \u0437\u0430\u0434\u0430\u0432\u0430\u0442\u044c \u0438 JVM-\u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435), \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e, \u0447\u0442\u043e\u0431\u044b \u0438\u0445 \u0438\u043c\u0435\u043d\u0430 \u0441\u043e\u0432\u043f\u0430\u0434\u0430\u043b\u0438 \u0441\u043e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u043c \u0440\u0435\u0433\u0443\u043b\u044f\u0440\u043d\u044b\u043c \u0432\u044b\u0440\u0430\u0436\u0435\u043d\u0438\u0435\u043c: ^(?i)(DQ)[a-z0-9_-]+$
, \u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440 DQ_STORAGE_PASSOWRD
\u0438\u043b\u0438 dqMattermostToken
. \u0412\u0441\u0435 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u043e\u043a\u0440\u0443\u0436\u0435\u043d\u0438\u044f, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0441\u043e\u0432\u043f\u0430\u0434\u0430\u044e\u0442 \u0441 \u044d\u0442\u0438\u043c \u0440\u0435\u0433\u0443\u043b\u044f\u0440\u043d\u044b\u043c \u0432\u044b\u0440\u0430\u0436\u0435\u043d\u0438\u0435\u043c \u0431\u0443\u0434\u0443\u0442 \u0441\u0447\u0438\u0442\u0430\u043d\u044b \u0438 \u0434\u043e\u0431\u0430\u0432\u043b\u0435\u043d\u044b \u0432 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u043e\u043d\u043d\u044b\u0435 \u0444\u0430\u0439\u043b\u044b \u0434\u043b\u044f \u043f\u043e\u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0435\u0439 \u043f\u043e\u0434\u0441\u0442\u0430\u043d\u043e\u0432\u043a\u0438 \u0432 \u043d\u0443\u0436\u043d\u044b\u0435 \u0440\u0430\u0437\u0434\u0435\u043b\u044b. \u041f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u0434\u043e\u0431\u0430\u0432\u043b\u044f\u044e\u0442\u0441\u044f \u043a\u0430\u043a \u0432 \u0444\u0430\u0439\u043b \u0441 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f, \u0442\u0430\u043a \u0438 \u0432 \u0444\u0430\u0439\u043b/\u044b \u0441 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0435\u0439 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u043e\u0432.
\u0422\u0438\u043f\u043e\u0432\u043e\u0439 \u043f\u0440\u0438\u043c\u0435\u0440 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u044f \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u043e\u043a\u0440\u0443\u0436\u0435\u043d\u0438\u044f - \u044d\u0442\u043e \u043f\u043e\u0434\u0441\u0442\u0430\u043d\u043e\u0432\u043a\u0430 \u0441\u0435\u043a\u0440\u0435\u0442\u043e\u0432 \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0432\u043d\u0435\u0448\u043d\u0438\u043c \u0441\u0438\u0441\u0442\u0435\u043c\u0430\u043c. \u0425\u0440\u0430\u043d\u0435\u043d\u0438\u0435 \u0442\u0430\u043a\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u0432 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u043e\u043d\u043d\u044b\u0445 \u0444\u0430\u0439\u043b\u0430\u0445 - \u044d\u0442\u043e \u043d\u0435 \u043e\u0447\u0435\u043d\u044c \u0445\u043e\u0440\u043e\u0448\u0430\u044f \u0438\u0434\u0435\u044f. \u041f\u043e\u044d\u0442\u043e\u043c\u0443 \u0432 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0435 Checkita \u0440\u0435\u0430\u043b\u0438\u0437\u043e\u0432\u0430\u043d \u043c\u0435\u0445\u0430\u043d\u0438\u0437\u043c \u0434\u043b\u044f \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u0442\u0430\u043a\u0438\u0445 \u0434\u0430\u043d\u043d\u044b\u0445 \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0438\u0441\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f.
\u0412\u0410\u0416\u041d\u041e \u041f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u0434\u043e\u0431\u0430\u0432\u043b\u044f\u044e\u0442\u0441\u044f \u0432 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u043e\u043d\u043d\u044b\u0435 \u0444\u0430\u0439\u043b\u044b \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0438\u0441\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f \u0438 \u043d\u0435 \u0441\u043e\u0445\u0440\u0430\u043d\u044f\u044e\u0442\u0441\u044f \u043d\u0438 \u0432 \u043a\u0430\u043a\u043e\u043c \u0432\u0438\u0434\u0435.
"},{"location":"ru/02-general-concepts/03-StatusModel/","title":"\u0421\u0442\u0430\u0442\u0443\u0441\u043d\u0430\u044f \u041c\u043e\u0434\u0435\u043b\u044c \u0420\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432","text":"\u0415\u0434\u0438\u043d\u0430\u044f \u0441\u0442\u0430\u0442\u0443\u0441\u043d\u0430\u044f \u043c\u043e\u0434\u0435\u043b\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043f\u043e\u043b\u0443\u0447\u0430\u044e\u0442\u0441\u044f \u0432 \u043f\u0440\u043e\u0446\u0435\u0441\u0441\u0435 \u0440\u0430\u0431\u043e\u0442\u044b \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430 Checkita. \u0420\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b \u0440\u0430\u0441\u0447\u0435\u0442\u0430 \u043a\u0430\u043a \u043c\u0435\u0442\u0440\u0438\u043a, \u0442\u0430\u043a \u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043e\u043a \u0438\u043c\u0435\u044e\u0442 \u043e\u0431\u0449\u0438\u0435 \u0438\u043d\u0434\u0438\u043a\u0430\u0442\u043e\u0440\u044b \u0438\u0445 \u0441\u0442\u0430\u0442\u0443\u0441\u043e\u0432, \u0430 \u0438\u043c\u0435\u043d\u043d\u043e:
Success
- \u0412\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u0435 \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0438\u043b\u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438 \u0437\u0430\u0432\u0435\u0440\u0448\u0438\u043b\u043e\u0441\u044c \u0431\u0435\u0437 \u043e\u0448\u0438\u0431\u043e\u043a \u0438 \u0443\u0441\u043b\u043e\u0432\u0438\u0435 \u0437\u0430\u0434\u0430\u043d\u043d\u043e\u0435 \u0432 \u043c\u0435\u0442\u0440\u0438\u043a\u0435 \u0438\u043b\u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0435 \u0432\u044b\u043f\u043e\u043b\u043d\u0435\u043d\u043e.Failure
- \u0412 \u0445\u043e\u0434\u0435 \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044f \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0438\u043b\u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438 \u0431\u044b\u043b\u0438 \u043f\u043e\u043b\u0443\u0447\u0435\u043d\u044b \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043d\u0435 \u0443\u0434\u043e\u0432\u043b\u0435\u0442\u0432\u043e\u0440\u044f\u044e\u0442 \u0443\u0441\u043b\u043e\u0432\u0438\u044e \u0434\u0430\u043d\u043d\u043e\u0439 \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0438\u043b\u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438. \u041d\u0430\u043f\u0440\u0438\u043c\u0435\u0440:regexMatch
\u043c\u0435\u0442\u0440\u0438\u043a\u0430 \u043f\u043e\u043b\u0443\u0447\u0438\u043b\u0430 \u043d\u0430 \u0432\u0445\u043e\u0434 \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435 \u043a\u043e\u043b\u043e\u043d\u043a\u0438, \u043a\u043e\u0442\u043e\u0440\u043e\u0435 \u043d\u0435 \u0441\u043e\u0432\u043f\u0430\u0434\u0430\u0435\u0442 \u0441 \u0443\u043a\u0430\u0437\u0430\u043d\u043d\u044b\u043c \u0440\u0435\u0433\u0443\u043b\u044f\u0440\u043d\u044b\u043c \u0432\u044b\u0440\u0430\u0436\u0435\u043d\u0438\u0435\u043c.Error
- \u041e\u0431\u043d\u0430\u0440\u0443\u0436\u0435\u043d\u0430 \u043e\u0448\u0438\u0431\u043a\u0430 \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0432\u044b\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f \u0440\u0430\u0441\u0447\u0435\u0442\u0430 \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0438\u043b\u0438 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438. \u0422\u0430\u043a\u0436\u0435 \u043f\u0435\u0440\u0435\u0445\u0432\u0430\u0442\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u0435 \u043e\u0431 \u043e\u0448\u0438\u0431\u043a\u0435.\u0412\u043e \u0432\u0441\u0435\u0445 \u0442\u0438\u043f\u0430\u0445 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0441\u0442\u0430\u0442\u0443\u0441 \u0441\u043e\u043f\u0440\u043e\u0432\u043e\u0436\u0434\u0430\u0435\u0442\u0441\u044f \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u0435\u043c, \u043a\u043e\u0442\u043e\u0440\u043e\u0435 \u0435\u0433\u043e \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u0435\u0442. \u041e\u0434\u043d\u0430\u043a\u043e, \u0435\u0441\u0442\u044c \u0440\u0430\u0437\u043b\u0438\u0447\u0438\u044f \u0432 \u0442\u043e\u043c, \u043a\u0430\u043a \u0441\u0442\u0430\u0442\u0443\u0441\u044b \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u043f\u043e\u043b\u0443\u0447\u0435\u043d\u044b \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c:
Success
, \u0442\u043e \u0434\u043b\u044f \u044d\u0442\u043e\u0439 \u0441\u0442\u0440\u043e\u043a\u0438 \u0431\u0443\u0434\u0435\u0442 \u0437\u0430\u043f\u0438\u0441\u0430\u043d\u0430 \u043e\u0448\u0438\u0431\u043a\u0430 \u0440\u0430\u0441\u0447\u0435\u0442\u0430 \u0434\u0430\u043d\u043d\u043e\u0439 \u043c\u0435\u0442\u0440\u0438\u043a\u0438. \u0414\u0430\u043b\u0435\u0435, \u043c\u043e\u0436\u043d\u043e \u0437\u0430\u043f\u0440\u043e\u0441\u0438\u0442\u044c \u043e\u0442\u0447\u0435\u0442 \u043e\u0431 \u043e\u0448\u0438\u0431\u043a\u0430\u0445 \u043f\u0440\u0438 \u0440\u0430\u0441\u0447\u0435\u0442\u0435 \u043c\u0435\u0442\u0440\u0438\u043a, \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u044f \u043a\u043e\u0442\u043e\u0440\u043e\u0433\u043e \u043e\u043f\u0438\u0441\u0430\u043d\u0430 \u0432 \u0433\u043b\u0430\u0432\u0435 \u041e\u0442\u0447\u0435\u0442\u044b \u043e\u0431 \u043e\u0448\u0438\u0431\u043a\u0430\u0445. \u041f\u043e\u0434\u0440\u043e\u0431\u043d\u0435\u0435 \u043e \u0441\u0431\u043e\u0440\u0435 \u043e\u0448\u0438\u0431\u043e\u043a \u043f\u043e \u0440\u0430\u0441\u0447\u0435\u0442\u0443 \u043c\u0435\u0442\u0440\u0438\u043a, \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u0421\u0431\u043e\u0440 \u041e\u0448\u0438\u0431\u043e\u043a \u0412\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044f \u041c\u0435\u0442\u0440\u0438\u043a.\u0412\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u0435 \u043c\u0435\u0442\u0440\u0438\u043a \u0432\u043a\u043b\u044e\u0447\u0430\u0435\u0442 \u0432 \u0441\u0435\u0431\u044f \u043f\u0440\u043e\u0446\u0435\u0441\u0441 \u0447\u0442\u0435\u043d\u0438\u044f \u0434\u0430\u043d\u043d\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0430 \u0441\u0442\u0440\u043e\u043a\u0430 \u0437\u0430 \u0441\u0442\u0440\u043e\u043a\u043e\u0439 \u0438 \u043f\u0440\u0438\u0440\u0430\u0449\u0435\u043d\u0438\u0435 \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u044f \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0434\u043b\u044f \u043a\u0430\u0436\u0434\u043e\u0439 \u0441\u0442\u0440\u043e\u043a\u0438 (\u0432 \u0441\u043b\u0443\u0447\u0430\u0435, \u0435\u0441\u043b\u0438 \u0443\u0441\u043b\u043e\u0432\u0438\u0435 \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0432\u044b\u043f\u043e\u043b\u043d\u0435\u043d\u043e). \u0422\u0430\u043a, \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044f \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u044f \u043f\u0440\u0438\u0440\u0430\u0449\u0435\u043d\u0438\u044f \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0434\u043b\u044f \u0442\u0435\u043a\u0443\u0449\u0435\u0439 \u0441\u0442\u0440\u043e\u043a\u0438 \u0447\u0442\u043e-\u0442\u043e \u043c\u043e\u0436\u0435\u0442 \u043f\u043e\u0439\u0442\u0438 \u043d\u0435 \u0442\u0430\u043a: \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u043a\u0430\u043a\u0438\u0435-\u043b\u0438\u0431\u043e \u043f\u0440\u043e\u0431\u043b\u0435\u043c\u044b \u0441 \u0434\u0430\u043d\u043d\u044b\u043c\u0438 \u0438\u043b\u0438 \u043e\u0448\u0438\u0431\u043a\u0438 \u0438\u0441\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f. \u0412 \u0434\u043e\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u0435, \u043c\u043d\u043e\u0433\u0438\u0435 \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0438\u043c\u0435\u044e\u0442 \u043b\u043e\u0433\u0438\u0447\u0435\u0441\u043a\u043e\u0435 \u0443\u0441\u043b\u043e\u0432\u0438\u0435 \u043a\u043e\u0442\u043e\u0440\u043e\u0435 \u0434\u043e\u043b\u0436\u043d\u043e \u0431\u044b\u0442\u044c \u0432\u044b\u043f\u043e\u043b\u043d\u0435\u043d\u043e, \u0447\u0442\u043e\u0431\u044b \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435 \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0431\u044b\u043b\u043e \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u043e. \u0415\u0441\u043b\u0438 \u044d\u0442\u043e \u0443\u0441\u043b\u043e\u0432\u0438\u0435 \u043d\u0435 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0435\u0442\u0441\u044f, \u0442\u043e \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u0435 \u043f\u0440\u0438\u0440\u0430\u0449\u0435\u043d\u0438\u044f \u0434\u043b\u044f \u0442\u0435\u043a\u0443\u0449\u0435\u0439 \u0441\u0442\u0440\u043e\u043a\u0438 \u0442\u0430\u043a\u0436\u0435 \u0440\u0430\u0441\u0441\u043c\u0430\u0442\u0440\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u043a\u0430\u043a \u043d\u0435 \u0432\u044b\u043f\u043e\u043b\u043d\u0435\u043d\u043d\u043e\u0435.
\u0422\u0430\u043a\u0438\u043c \u043e\u0431\u0440\u0430\u0437\u043e\u043c, \u0432 \u0432\u044b\u0448\u0435\u043e\u043f\u0438\u0441\u0430\u043d\u043d\u044b\u0445 \u0441\u0438\u0442\u0443\u0430\u0446\u0438\u044f\u0445 \u0431\u0443\u0434\u0435\u0442 \u0437\u0430\u0434\u0435\u0439\u0441\u0442\u0432\u043e\u0432\u0430\u043d \u043c\u0435\u0445\u0430\u043d\u0438\u0437\u043c \u0441\u0431\u043e\u0440\u043a\u0430 \u043e\u0448\u0438\u0431\u043e\u043a \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044f \u043c\u0435\u0442\u0440\u0438\u043a \u0438 \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0430\u044f \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u044f \u043e\u0431 \u043e\u0448\u0438\u0431\u043a\u0435 \u0431\u0443\u0434\u0435\u0442 \u0437\u0430\u043f\u0438\u0441\u0430\u043d\u0430:
Failure
, \u043b\u0438\u0431\u043e Error
) \u0438 \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u0435 \u0441 \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u0435\u043c \u043e\u0448\u0438\u0431\u043a\u0438.\u041f\u043e\u0441\u043a\u043e\u043b\u044c\u043a\u0443 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0438 \u0434\u0430\u043d\u043d\u044b\u0445, \u043d\u0430\u0434 \u043a\u043e\u0442\u043e\u0440\u044b\u043c\u0438 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u044e\u0442\u0441\u044f \u0440\u0430\u0441\u0447\u0435\u0442\u044b, \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u043e\u0447\u0435\u043d\u044c \u0431\u043e\u043b\u044c\u0448\u0438\u043c, \u0438, \u043a\u0430\u043a \u0441\u043b\u0435\u0434\u0441\u0442\u0432\u0438\u0435, \u043c\u043e\u0433\u0443\u0442 \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u0434\u0438\u0442\u044c \u0437\u043d\u0430\u0447\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435 \u0447\u0438\u0441\u043b\u043e \u043e\u0448\u0438\u0431\u043e\u043a \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044f \u043c\u0435\u0442\u0440\u0438\u043a, \u0442\u043e \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0443\u0435\u0442 \u0440\u0438\u0441\u043a \u043f\u0435\u0440\u0435\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f \u043f\u0430\u043c\u044f\u0442\u0438 \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0437\u0430\u043f\u0438\u0441\u0438 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043e\u0431 \u044d\u0442\u0438\u0445 \u043e\u0448\u0438\u0431\u043a\u0430\u0445. \u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u044d\u0442\u043e \u043f\u0440\u0435\u0434\u043e\u0442\u0432\u0440\u0430\u0442\u0438\u0442\u044c, \u043a\u043e\u043b\u0438\u0447\u0435\u0441\u0442\u0432\u043e \u043e\u0448\u0438\u0431\u043e\u043a, \u043a\u043e\u0442\u043e\u0440\u043e\u0435 \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u0441\u043e\u0431\u0440\u0430\u043d\u043e \u0434\u043b\u044f \u043a\u0430\u0436\u0434\u043e\u0439 \u043c\u0435\u0442\u0440\u0438\u043a\u0438, \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0435\u043d\u043e \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 10000 \u043e\u0448\u0438\u0431\u043e\u043a. \u0412 \u0441\u043b\u0443\u0447\u0430\u0435 \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e\u0441\u0442\u0438 \u044d\u0442\u043e \u0447\u0438\u0441\u043b\u043e \u043c\u043e\u0436\u043d\u043e \u0443\u043c\u0435\u043d\u044c\u0448\u0438\u0442\u044c, \u0443\u0441\u0442\u0430\u043d\u043e\u0432\u0438\u0432 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0432\u0443\u044e\u0449\u0435\u0435 \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0435\u043d\u0438\u0435 \u0432 \u0444\u0430\u0439\u043b\u0435 \u0441 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u0432 \u043f\u043e\u043b\u0435 errorDumpSize
. \u0421\u043c. \u0433\u043b\u0430\u0432\u0443 \u0410\u043a\u0442\u0438\u0432\u0430\u0442\u043e\u0440\u044b.
\u0421\u043e\u0431\u0440\u0430\u043d\u043d\u044b\u0435 \u043e\u0448\u0438\u0431\u043a\u0438 \u043f\u043e \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044e \u043c\u0435\u0442\u0440\u0438\u043a \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u044b, \u0447\u0442\u043e\u0431\u044b \u043d\u0430\u0445\u043e\u0434\u0438\u0442\u044c \u0438 \u0438\u0441\u043f\u0440\u0430\u0432\u043b\u044f\u0442\u044c \u043f\u0440\u043e\u0431\u043b\u0435\u043c\u044b \u0432 \u0434\u0430\u043d\u043d\u044b\u0445. \u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u043f\u043e\u043b\u0443\u0447\u0438\u0442\u044c \u044d\u0442\u0438 \u043e\u0448\u0438\u0431\u043a\u0438, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u043d\u0430\u0441\u0442\u0440\u043e\u0438\u0442\u044c \u0441\u043e\u0445\u0440\u0430\u043d\u0435\u043d\u0438\u0435 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0443\u044e\u0449\u0438\u0445 \u043e\u0442\u0447\u0435\u0442\u043e\u0432 \u043f\u043e \u0437\u0430\u0432\u0435\u0440\u0448\u0435\u043d\u0438\u0438 \u0440\u0430\u0431\u043e\u0442\u044b \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430. \u0417\u0430\u043f\u0440\u043e\u0441 \u043d\u0430 \u0441\u0431\u043e\u0440 \u043e\u0442\u0447\u0435\u0442\u043e\u0432 \u043e\u0431 \u043e\u0448\u0438\u0431\u043a\u0430\u0445 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0435\u0442\u0441\u044f \u0440\u0430\u0437\u0434\u0435\u043b\u0435 targets
\u0444\u0430\u0439\u043b\u0430 \u0441 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u044f\u043c\u0438 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430, \u043a\u0430\u043a \u044d\u0442\u043e \u043e\u043f\u0438\u0441\u0430\u043d\u043e \u0432 \u0433\u043b\u0430\u0432\u0435 \u041e\u0442\u0447\u0435\u0442\u044b \u043e\u0431 \u043e\u0448\u0438\u0431\u043a\u0430\u0445. \u0421\u0442\u043e\u0438\u0442 \u0442\u0430\u043a\u0436\u0435 \u0437\u0430\u043c\u0435\u0442\u0438\u0442\u044c, \u0447\u0442\u043e \u043e\u0442\u0447\u0435\u0442\u044b \u0431\u0443\u0434\u0443\u0442 \u0441\u043e\u0434\u0435\u0440\u0436\u0430\u0442\u044c \u0432\u044b\u0431\u043e\u0440\u043a\u0438 \u0434\u0430\u043d\u043d\u044b\u0445 \u0438\u0437 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432, \u043f\u043e\u044d\u0442\u043e\u043c\u0443 \u0441\u043b\u0435\u0434\u0443\u0435\u0442 \u0432\u043d\u0438\u043c\u0430\u0442\u0435\u043b\u044c\u043d\u043e \u043e\u0442\u043d\u043e\u0441\u0438\u0442\u044c\u0441\u044f \u043a \u0442\u043e\u043c\u0443, \u043a\u0442\u043e \u0431\u0443\u0434\u0435\u0442 \u0438\u043c\u0435\u0442\u044c \u0434\u043e\u0441\u0442\u0443\u043f \u043a \u044d\u0442\u0438\u043c \u043e\u0442\u0447\u0435\u0442\u0430\u043c. \u041f\u043e \u044d\u0442\u0438\u043c \u0436\u0435 \u0441\u043e\u043e\u0431\u0440\u0430\u0436\u0435\u043d\u0438\u044f\u043c, \u043e\u0442\u0447\u0435\u0442\u044b \u043e\u0431 \u043e\u0448\u0438\u0431\u043a\u0430\u0445 \u043d\u0435 \u0441\u043e\u0445\u0440\u0430\u043d\u044f\u044e\u0442\u0441\u044f \u0432 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430.
\u0412\u0410\u0416\u041d\u041e \u0424\u0443\u043d\u043a\u0446\u0438\u043e\u043d\u0430\u043b \u0441\u0432\u044f\u0437\u0430\u043d\u043d\u044b\u0439 \u0441 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u043e\u0439 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0430 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432 \u0434\u0430\u043d\u043d\u044b\u0445 \u043d\u0430 \u0434\u0430\u043d\u043d\u044b\u0439 \u043c\u043e\u043c\u0435\u043d\u0442 \u043d\u0430\u0445\u043e\u0434\u0438\u0442\u0441\u044f \u0432 \u044d\u043a\u0441\u043f\u0435\u0440\u0438\u043c\u0435\u043d\u0442\u0430\u043b\u044c\u043d\u043e\u0439 \u0441\u0442\u0430\u0434\u0438\u0438, \u043f\u043e\u044d\u0442\u043e\u043c\u0443 \u0432 \u043d\u0435\u043c \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u044b \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u0438\u044f.
\u041a\u0430\u043a \u0443\u0436\u0435 \u0431\u044b\u043b\u043e \u0441\u043a\u0430\u0437\u0430\u043d\u043e, \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a Checkita \u0441\u043f\u043e\u0441\u043e\u0431\u0435\u043d \u0440\u0430\u0441\u0441\u0447\u0438\u0442\u044b\u0432\u0430\u0442\u044c \u043c\u0435\u0442\u0440\u0438\u043a\u0438 \u0438 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0442\u044c \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0430 \u043d\u0430\u0434 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u043c\u0438 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0430\u043c\u0438 \u0434\u0430\u043d\u043d\u044b\u0445. \u0422\u0430\u043a \u043a\u0430\u043a Spark \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0432 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0435 \u0432\u044b\u0447\u0438\u0441\u043b\u0438\u0442\u0435\u043b\u044c\u043d\u043e \u044f\u0434\u0440\u0430, \u0442\u043e Spark Structured Streaming API \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044f \u043c\u0435\u0442\u0440\u0438\u043a \u043d\u0430\u0434 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u043c\u0438 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0430\u043c\u0438 \u0434\u0430\u043d\u043d\u044b\u0445.
\u041e\u0441\u043d\u043e\u0432\u043d\u0430\u044f \u0438\u0434\u0435\u044f \u043f\u0440\u0438 \u0437\u0430\u043f\u0443\u0441\u043a\u0435 data quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430 \u0432 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u043e\u043c \u0440\u0435\u0436\u0438\u043c\u0435, \u044d\u0442\u043e \u0441\u043e\u0445\u0440\u0430\u043d\u0438\u0442\u044c \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0442\u044c \u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u043e \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432 \u0434\u0430\u043d\u043d\u044b\u0445 \u043e\u0434\u043d\u043e\u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e. \u0422\u0430\u043a \u043a\u0430\u043a \u0440\u0430\u0441\u0447\u0435\u0442 \u043c\u0435\u0442\u0440\u0438\u043a \u043d\u0430\u0434 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0430\u043c\u0438 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u044f\u0435\u0442 \u0441\u043e\u0431\u043e\u0439 \u043e\u043f\u0435\u0440\u0430\u0446\u0438\u044e \u0441 \u0441\u043e\u0445\u0440\u0430\u043d\u0435\u043d\u0438\u0435\u043c \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u044f, \u0442\u043e \u0432\u0441\u0435 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0435 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0438 \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u0432 \"\u043e\u043a\u043e\u043d\u043d\u043e\u043c\" \u0440\u0435\u0436\u0438\u043c\u0435: \u0444\u043e\u0440\u043c\u0438\u0440\u0443\u044e\u0442\u0441\u044f \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u043e\u043a\u043d\u0430, \u043e\u0442\u0441\u043b\u0435\u0436\u0438\u0432\u0430\u044e\u0449\u0438\u0435 \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u0435 \u0437\u0430 \u0434\u0430\u043d\u043d\u044b\u0439 \u043f\u0440\u043e\u043c\u0435\u0436\u0443\u0442\u043e\u043a \u0432\u0440\u0435\u043c\u0435\u043d\u0438. \u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u043e\u0434\u043d\u043e\u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0442\u044c \u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u043e \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432, \u0438\u0445 \u043e\u043a\u043d\u0430 \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u0441\u0438\u043d\u0445\u0440\u043e\u043d\u0438\u0437\u0438\u0440\u043e\u0432\u0430\u043d\u044b: (1) \u043e\u043d\u0438 \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u043e\u0434\u043d\u043e\u0433\u043e \u0440\u0430\u0437\u043c\u0435\u0440\u0430 \u0438 (2) \u0434\u043e\u043b\u0436\u043d\u044b \u043d\u0430\u0447\u0438\u043d\u0430\u0442\u044c\u0441\u044f \u0432 \u043e\u0434\u043d\u043e \u0438 \u0442\u043e \u0436\u0435 \u0432\u0440\u0435\u043c\u044f. \u0427\u0442\u043e\u0431\u044b \u044d\u0442\u043e \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0438\u0442\u044c, \u0440\u0430\u0437\u043c\u0435\u0440 \u043e\u043a\u043e\u043d \u0443\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u0438 \u044f\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u0435\u0434\u0438\u043d\u044b\u043c \u0434\u043b\u044f \u0432\u0441\u0435\u0445 \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0435\u043c\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432 \u0434\u0430\u043d\u043d\u044b\u0445.
\u041f\u043e\u0441\u043a\u043e\u043b\u044c\u043a\u0443 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0435 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0438 \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u043f\u043e \u043a\u0430\u0436\u0434\u043e\u043c\u0443 \u043e\u043a\u043d\u0443, \u043a\u043b\u044e\u0447\u0435\u0432\u044b\u043c \u043c\u043e\u043c\u0435\u043d\u0442\u043e\u043c \u044f\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u043d\u0430\u043b\u0438\u0447\u0438\u0435 \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0439 \u043c\u0435\u0442\u043a\u0438 \u0434\u043b\u044f \u043a\u0430\u0436\u0434\u043e\u0439 \u0437\u0430\u043f\u0438\u0441\u0438, \u043a\u043e\u0442\u043e\u0440\u0430\u044f \u0431\u0443\u0434\u0435\u0442 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0430, \u0447\u0442\u043e\u0431\u044b \u043f\u043e\u043c\u0435\u0441\u0442\u0438\u0442\u044c \u044d\u0442\u0443 \u0437\u0430\u043f\u0438\u0441\u044c \u0432 \u0442\u043e \u0438\u043b\u0438 \u0438\u043d\u043e\u0435 \u043e\u043a\u043d\u043e. \u041f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u043e \u043e\u043f\u0446\u0438\u0439, \u0447\u0442\u043e\u0431\u044b \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u0438\u0442\u044c \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u0443\u044e \u043c\u0435\u0442\u043a\u0443:
Processing time
- Spark \u0444\u043e\u0440\u043c\u0438\u0440\u0443\u0435\u0442 \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u0443\u044e \u043c\u0435\u0442\u043a\u0443 \u0430\u0432\u0442\u043e\u043c\u0430\u0442\u0438\u0447\u0435\u0441\u043a\u0438 \u0434\u043b\u044f \u043a\u0430\u0436\u0434\u043e\u0439 \u0437\u0430\u043f\u0438\u0441\u0438, \u043a\u043e\u0433\u0434\u0430 \u043e\u043d\u0430 \u043f\u043e\u0441\u0442\u0443\u043f\u0430\u0435\u0442 \u0432 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0443. \u0414\u043b\u044f \u044d\u0442\u043e\u0433\u043e \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0444\u0443\u043d\u043a\u0446\u0438\u044f current_timestamp
.Event time
- \u0412 \u0431\u043e\u043b\u044c\u0448\u0435\u0439 \u0441\u0442\u0435\u043f\u0435\u043d\u0438 \u043f\u0440\u0438\u043c\u0435\u043d\u0438\u043c\u043e \u043a \u0442\u043e\u043f\u0438\u043a\u0430\u043c Kafka: \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u0430\u044f \u043c\u0435\u0442\u043a\u0430 \u0441\u0447\u0438\u0442\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u0438\u0437 \u043a\u043e\u043b\u043e\u043d\u043a\u0438 timestamp
, \u0432 \u043a\u043e\u0442\u043e\u0440\u043e\u0439 \u0445\u0440\u0430\u043d\u0438\u0442\u0441\u044f \u0432\u0440\u0435\u043c\u044f \u0441\u043e\u0437\u0434\u0430\u043d\u0438\u044f \u0437\u0430\u043f\u0438\u0441\u0438 (event time).Custom time
- \u041e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u0430\u044f \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u043a\u043e\u043b\u043e\u043d\u043a\u0430 \u0442\u0438\u043f\u0430 timestamp, \u0438\u0437 \u043a\u043e\u0442\u043e\u0440\u043e\u0439 \u0431\u0443\u0434\u0435\u0442 \u0441\u0447\u0438\u0442\u0430\u043d\u0430 \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u0430\u044f \u043c\u0435\u0442\u043a\u0430.\u0422\u0430\u043a\u0436\u0435 \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0432\u044b\u044f\u0441\u043d\u0438\u0442\u044c \u0442\u043e, \u043a\u043e\u0433\u0434\u0430 \u043c\u043e\u0436\u043d\u043e \u0441\u0447\u0438\u0442\u0430\u0442\u044c \u043a\u0430\u043a\u043e\u0435-\u043b\u0438\u0431\u043e \u043e\u043a\u043d\u043e \u043f\u043e\u043b\u043d\u043e\u0441\u0442\u044c\u044e \u0441\u0444\u043e\u0440\u043c\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u043c. \u0414\u0440\u0443\u0433\u0438\u043c\u0438 \u0441\u043b\u043e\u0432\u0430\u043c\u0438, \u043d\u0430\u0434\u043e \u0443\u0441\u0442\u0430\u043d\u043e\u0432\u0438\u0442\u044c \u043f\u0440\u0430\u0432\u0438\u043b\u0430, \u0441\u043e\u0433\u043b\u0430\u0441\u043d\u043e \u043a\u043e\u0442\u043e\u0440\u044b\u043c \u043c\u043e\u0436\u043d\u043e \u0431\u0443\u0434\u0435\u0442 \u0441\u0447\u0438\u0442\u0430\u0442\u044c \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u0435 \u043e\u043a\u043d\u0430 \u043e\u043a\u043e\u043d\u0447\u0430\u0442\u0435\u043b\u044c\u043d\u044b\u043c\u0438 \u0438 \u043f\u0440\u0435\u0434\u043f\u043e\u043b\u0430\u0433\u0430\u0442\u044c, \u0447\u0442\u043e \u043d\u0438\u043a\u0430\u043a\u0438\u0435 \u0434\u0440\u0443\u0433\u0438\u0435 \u0437\u0430\u043f\u0438\u0441\u0438 \u0431\u043e\u043b\u044c\u0448\u0435 \u043d\u0435 \u043f\u043e\u043f\u0430\u0434\u0443\u0442 \u0432 \u044d\u0442\u043e \u043e\u043a\u043d\u043e. \u0420\u0430\u0441\u043f\u0440\u043e\u0441\u0442\u0440\u0430\u043d\u0435\u043d\u043d\u044b\u043c \u043f\u043e\u0434\u0445\u043e\u0434\u043e\u043c \u0434\u043b\u044f \u0440\u0435\u0448\u0435\u043d\u0438\u044f \u044d\u0442\u043e\u0439 \u043f\u0440\u043e\u0431\u043b\u0435\u043c\u044b \u043f\u0440\u0438 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u043e\u0439 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 \u044f\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0435 \u0442\u0430\u043a \u043d\u0430\u0437\u044b\u0432\u0430\u0435\u043c\u044b\u0445 \"\u0432\u043e\u0434\u044f\u043d\u044b\u0445 \u0437\u043d\u0430\u043a\u043e\u0432\" (watermarks). \u0412\u043e\u0434\u044f\u043d\u043e\u0439 \u0437\u043d\u0430\u043a \u0441\u043e\u0434\u0435\u0440\u0436\u0438\u0442 \u0432 \u0441\u0435\u0431\u0435 \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u0443\u044e \u043c\u0435\u0442\u043a\u0443 \u043a\u043e\u0442\u043e\u0440\u0430\u044f \u0443\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442 \u0443\u0440\u043e\u0432\u0435\u043d\u044c \u0434\u043b\u044f \u043f\u0440\u0438\u043d\u044f\u0442\u0438\u044f \u043d\u043e\u0432\u044b\u0445 \u0437\u0430\u043f\u0438\u0441\u0435\u0439 \u0432 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0443. \u0415\u0441\u043b\u0438 \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u0430\u044f \u043c\u0435\u0442\u043a\u0430 \u0437\u0430\u043f\u0438\u0441\u0438 \"\u043d\u0438\u0436\u0435\" \u0443\u0440\u043e\u0432\u043d\u044f \"\u0432\u043e\u0434\u044f\u043d\u043e\u0433\u043e \u0437\u043d\u0430\u043a\u0430\", \u0442\u043e \u0434\u0430\u043d\u043d\u0430\u044f \u0437\u0430\u043f\u0438\u0441\u044c \u0441\u0447\u0438\u0442\u0430\u0435\u0442\u0441\u044f \"\u043e\u043f\u043e\u0437\u0434\u0430\u0432\u0448\u0435\u0439\" \u0438 \u043d\u0435 \u043f\u0440\u0438\u043d\u0438\u043c\u0430\u0435\u0442\u0441\u044f \u0432 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0443. \"\u0412\u043e\u0434\u044f\u043d\u043e\u0439 \u0437\u043d\u0430\u043a\" \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u0442\u0441\u044f \u043a\u0430\u043a \u043c\u0430\u043a\u0441\u0438\u043c\u0430\u043b\u044c\u043d\u043e\u0435 \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435 \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0439 \u043c\u0435\u0442\u043a\u0438 \u0443 \u0443\u0436\u0435 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u0430\u043d\u043d\u044b\u0445 \u0437\u0430\u043f\u0438\u0441\u0435\u0439 \u0437\u0430 \u0432\u044b\u0447\u0435\u0442\u043e\u043c \u0437\u0430\u0440\u0430\u043d\u0435\u0435 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u043d\u043e\u0433\u043e \u0441\u043c\u0435\u0449\u0435\u043d\u0438\u044f. \u0411\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u0430\u044f \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u044f \u043e\u0431 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0438 \u043f\u043e\u0434\u0445\u043e\u0434\u0430 \"\u0432\u043e\u0434\u044f\u043d\u044b\u0445 \u0437\u043d\u0430\u043a\u043e\u0432\" \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0430 \u0432 \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u0430\u0446\u0438\u0438 Spark: Handling Late Data and Watermarking. \u0422\u0430\u043a, \u0434\u043b\u044f \u0446\u0435\u043b\u0438 \u0441\u0438\u043d\u0445\u0440\u043e\u043d\u0438\u0437\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u043e\u0439 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0438 \u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u0438\u0445 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432, \u0443\u0440\u043e\u0432\u0435\u043d\u044c \u0441\u043c\u0435\u0449\u0435\u043d\u0438\u044f \u0434\u043b\u044f \u0432\u043e\u0434\u044f\u043d\u043e\u0433\u043e \u0437\u043d\u0430\u043a\u0430 \u0443\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u0438 \u043e\u0434\u0438\u043d\u0430\u043a\u043e\u0432 \u0434\u043b\u044f \u0432\u0441\u0435\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432.
\u041d\u0430\u043a\u043e\u043d\u0435\u0446, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u043e\u0442\u043c\u0435\u0442\u0438\u0442\u044c, \u0447\u0442\u043e \u0434\u0432\u0438\u0436\u043e\u043a Spark Structure Streaming \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0435 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0438 \u0432 \u0440\u0435\u0436\u0438\u043c\u0435 \u043c\u0438\u043a\u0440\u043e-\u043f\u0430\u043a\u0435\u0442\u043e\u0432 (micro-batches). \u0422\u0430\u043a, \u0437\u0430\u043f\u0438\u0441\u0438 \u0441\u043e\u0431\u0438\u0440\u0430\u044e\u0442\u0441\u044f \u0437\u0430 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u043d\u044b\u0439 (\u043a\u0430\u043a \u043f\u0440\u0430\u0432\u0438\u043b\u043e, \u043e\u0447\u0435\u043d\u044c \u043a\u043e\u0440\u043e\u0442\u043a\u0438\u0439) \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b \u0432\u0440\u0435\u043c\u0435\u043d\u0438 \u0438 \u0434\u0430\u043b\u0435\u0435 \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u043a\u0430\u043a \u0441\u0442\u0430\u0442\u0438\u0447\u043d\u044b\u0439 \u0434\u0430\u0442\u0430\u0444\u0440\u0435\u0439\u043c. Spark \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u043d\u0430\u0441\u0442\u0440\u0430\u0438\u0432\u0430\u0442\u044c \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0439 \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b \u0437\u0430 \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0431\u0443\u0434\u0443\u0442 \u0441\u043e\u0431\u0438\u0440\u0430\u0442\u044c\u0441\u044f \u0437\u0430\u043f\u0438\u0441\u0438 \u0432 \u043c\u0438\u043a\u0440\u043e-\u043f\u0430\u043a\u0435\u0442. \u0414\u043b\u044f \u044d\u0442\u043e\u0433\u043e \u0437\u0430\u0434\u0430\u0435\u0442\u0441\u044f trigger
\u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b. \u0414\u0430\u043d\u043d\u044b\u0445 \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b \u0442\u0430\u043a\u0436\u0435 \u0434\u043e\u043b\u0436\u0435\u043d \u0431\u044b\u0442\u044c \u0435\u0434\u0438\u043d\u044b\u043c \u0434\u043b\u044f \u0432\u0441\u0435\u0445 \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0435\u043c\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432 \u0434\u0430\u043d\u043d\u044b\u0445 \u0438 \u0443\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f. \u041d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430 \u044d\u0442\u043e\u0433\u043e \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b\u0430 \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u043a\u043e\u043d\u0442\u0440\u043e\u043b\u0438\u0440\u043e\u0432\u0430\u0442\u044c \u0440\u0430\u0437\u043c\u0435\u0440 \u043c\u0438\u043a\u0440\u043e-\u043f\u0430\u043a\u0435\u0442\u043e\u0432 \u0434\u0430\u043d\u043d\u044b\u0445 \u0438, \u043a\u0430\u043a \u0441\u043b\u0435\u0434\u0441\u0442\u0432\u0438\u0435, \u043d\u0430\u0433\u0440\u0443\u0437\u043a\u0443 \u043d\u0430 \u044d\u043a\u0437\u0435\u043a\u044c\u044e\u0442\u043e\u0440\u044b.
\u0414\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043e \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u0445 \u0434\u043b\u044f \u0437\u0430\u043f\u0443\u0441\u043a\u0430 data quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u043e\u0432 \u0432 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u043e\u043c \u0440\u0435\u0436\u0438\u043c\u0435, \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u041d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0438 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u043e\u0439 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0438 chapter. \u041f\u043e\u0434\u044b\u0442\u043e\u0436\u0438\u0432, \u0440\u0430\u0431\u043e\u0442\u0430 data quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430 \u0432 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u043e\u043c \u0440\u0435\u0436\u0438\u043c\u0435 \u0441\u043e\u0441\u0442\u043e\u0438\u0442 \u0438\u0437 \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0445 \u044d\u0442\u0430\u043f\u043e\u0432:
forEachBatch sink
.\u041e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0430 \u0434\u0430\u043d\u043d\u044b\u0445 \u043a\u0430\u0436\u0434\u043e\u0433\u043e \u043c\u0438\u043a\u0440\u043e-\u043f\u0430\u043a\u0435\u0442\u0430 (\u0444\u043e\u0440\u043c\u0438\u0440\u0443\u0435\u0442\u0441\u044f \u0440\u0430\u0437 \u0437\u0430 trigger
\u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b):
\u041f\u0440\u043e\u0446\u0435\u0441\u0441\u043e\u0440 \u043e\u043a\u043e\u043d \u043f\u0440\u043e\u0432\u0435\u0440\u044f\u0435\u0442 \u0431\u0443\u0444\u0435\u0440 (\u0442\u0430\u043a\u0436\u0435 \u043e\u0434\u0438\u043d \u0440\u0430\u0437 \u0437\u0430 trigger
\u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b) \u043d\u0430 \u043d\u0430\u043b\u0438\u0447\u0438\u0435 \u043e\u043a\u043e\u043d, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043f\u043e\u043b\u043d\u043e\u0441\u0442\u044c\u044e \u0441\u0444\u043e\u0440\u043c\u0438\u0440\u043e\u0432\u0430\u043d\u044b, \u0442.\u0435. \u043d\u0430\u0445\u043e\u0434\u044f\u0442\u0441\u044f \u0446\u0435\u043b\u0438\u043a\u043e\u043c \u043d\u0438\u0436\u0435 \u0443\u0440\u043e\u0432\u043d\u044f \"\u0432\u043e\u0434\u044f\u043d\u043e\u0433\u043e \u0437\u043d\u0430\u043a\u0430\". \u0412\u0410\u0416\u041d\u041e \u0414\u043b\u044f \u0442\u043e\u0433\u043e, \u0447\u0442\u043e\u0431\u044b \u0438\u043c\u0435\u0442\u044c \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0442\u044c \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u0435 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0438 \u0441\u0438\u043d\u0445\u0440\u043e\u043d\u043d\u043e, \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u043c\u0438\u043d\u0438\u043c\u0430\u043b\u044c\u043d\u043e\u0435 \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435 \"\u0432\u043e\u0434\u044f\u043d\u043e\u0433\u043e \u0437\u043d\u0430\u043a\u0430\" (\u0432\u044b\u0447\u0438\u0441\u043b\u044f\u0435\u0442\u0441\u044f \u043d\u0430 \u043e\u0441\u043d\u043e\u0432\u0435 \u0442\u0435\u043a\u0443\u0449\u0438\u0445 \u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0439 \"\u0432\u043e\u0434\u044f\u043d\u044b\u0445 \u0437\u043d\u0430\u043a\u043e\u0432\" \u0432\u0441\u0435\u0445 \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0435\u043c\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432). \u0422\u0430\u043a\u043e\u0439 \u043f\u043e\u0434\u0445\u043e\u0434 \u0433\u0430\u0440\u0430\u043d\u0442\u0438\u0440\u0443\u0435\u0442, \u0447\u0442\u043e \u043e\u043a\u043d\u043e \u0431\u0443\u0434\u0435\u0442 \u043f\u043e\u043b\u043d\u043e\u0441\u0442\u044c\u044e \u0441\u0444\u043e\u0440\u043c\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u043c \u0434\u043b\u044f \u0432\u0441\u0435\u0445 \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0435\u043c\u044b\u0445 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u0432.
\u041a\u0430\u043a \u0442\u043e\u043b\u044c\u043a\u043e \u043f\u043e\u043b\u0443\u0447\u0435\u043d\u043e \u043f\u043e\u043b\u043d\u043e\u0441\u0442\u044c\u044e \u0441\u0444\u043e\u0440\u043c\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u043e\u0435 \u043e\u043a\u043d\u043e, \u0442\u043e \u0434\u043b\u044f \u043d\u0435\u0433\u043e \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u044e\u0442\u0441\u044f \u0432\u0441\u0435 \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u044b\u0435 \u043f\u0440\u043e\u0446\u0435\u0434\u0443\u0440\u044b:
Streaming queries \u0438 \u043f\u0440\u043e\u0446\u0435\u0441\u0441\u043e\u0440 \u043e\u043a\u043e\u043d \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u044e\u0442\u0441\u044f \u0434\u043e \u0442\u0435\u0445 \u043f\u043e\u0440, \u043f\u043e\u043a\u0430 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u043d\u0435 \u0431\u0443\u0434\u0435\u0442 \u043e\u0441\u0442\u0430\u043d\u043e\u0432\u043b\u0435\u043d\u043e (\u043f\u043e\u043b\u0443\u0447\u0435\u043d \u0441\u0438\u0433\u043d\u0430\u043b sigterm
) \u0438\u043b\u0438 \u0436\u0435 \u043f\u043e\u043a\u0430 \u043d\u0435 \u0441\u043b\u0443\u0447\u0438\u0442\u0441\u044f \u043a\u0430\u043a\u0430\u044f-\u043b\u0438\u0431\u043e \u043e\u0448\u0438\u0431\u043a\u0430.
\u0412\u0430\u0436\u043d\u043e \u0437\u0430\u043c\u0435\u0447\u0430\u043d\u0438\u0435 \u043e \u0441\u043e\u0445\u0440\u0430\u043d\u0435\u043d\u0438\u0438 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0432 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435: \u043f\u043e\u0441\u043a\u043e\u043b\u044c\u043a\u0443 \u043d\u0430\u0431\u043e\u0440 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432 \u0444\u043e\u0440\u043c\u0438\u0440\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u043a\u0430\u0436\u0434\u043e\u0433\u043e \u043e\u043a\u043d\u0430, \u0442\u043e referenceDate
\u0438 executionDate
\u0443\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u044e\u0442\u0441\u044f \u0440\u0430\u0432\u043d\u044b\u043c\u0438 \u0434\u0430\u0442\u0435 \u0438 \u0432\u0440\u0435\u043c\u0435\u043d\u0438 \u0441\u0442\u0430\u0440\u0442\u0430 \u0434\u0430\u043d\u043d\u043e\u0433\u043e \u043e\u043a\u043d\u0430. \u0414\u043b\u044f \u0431\u043e\u043b\u0435\u0435 \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u043f\u043e \u0440\u0430\u0431\u043e\u0442\u0435 \u0441 \u0434\u0430\u0442\u0430\u043c\u0438 \u0432\u043e \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0435, \u0441\u043c. \u0433\u043b\u0430\u0432\u0443 \u0420\u0430\u0431\u043e\u0442\u0430 \u0441 \u0414\u0430\u0442\u0430\u043c\u0438.
\u0421\u041e\u0412\u0415\u0422 \u041f\u043e\u0441\u043a\u043e\u043b\u044c\u043a\u0443 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0430 \u0434\u0430\u043d\u043d\u044b\u0445 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u044e\u0442\u0441\u044f \u0434\u043b\u044f \u043a\u0430\u0436\u0434\u043e\u0433\u043e \u043e\u043a\u043d\u0430, \u0442\u043e \u0440\u0430\u0437\u043c\u0435\u0440 \u044d\u0442\u043e\u0433\u043e \u043e\u043a\u043d\u0430 \u0441\u043a\u043e\u0440\u0435\u0435 \u0434\u043e\u043b\u0436\u0435\u043d \u0431\u044b\u0442\u044c \u0434\u043e\u0441\u0442\u0430\u0442\u043e\u0447\u043d\u043e \u0431\u043e\u043b\u044c\u0448\u0438\u043c, \u0447\u0442\u043e\u0431\u044b \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u043b\u044f\u0442\u044c \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b \u0441 \u0442\u0435\u043c \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b\u043e\u043c \u0432\u0440\u0435\u043c\u0435\u043d\u0438, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u043f\u043e\u0437\u0432\u043e\u043b\u0438\u0442 \u0437\u0430 \u044d\u0442\u0438\u043c \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u0430\u043c\u0438 \u0441\u043b\u0435\u0434\u0438\u0442\u044c \u0438 \u0440\u0435\u0430\u0433\u0438\u0440\u043e\u0432\u0430\u0442\u044c \u043d\u0430 \u043a\u0430\u043a\u0438\u0435-\u043b\u0438\u0431\u043e \u043f\u0440\u043e\u0431\u043b\u0435\u043c\u044b \u0441 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u043e\u043c \u0434\u0430\u043d\u043d\u044b\u0445. \u0422\u0430\u043a, \u0435\u0441\u043b\u0438 \"\u0432\u0440\u0435\u043c\u044f \u0440\u0435\u0430\u043a\u0446\u0438\u0438\" \u0432\u0430\u0448\u0435\u0439 \u0438\u043d\u0436\u0435\u043d\u0435\u0440\u043d\u043e\u0439 \u043a\u043e\u043c\u0430\u043d\u0434\u044b \u043f\u0440\u0438\u043c\u0435\u0440\u043d\u043e 1 \u0447\u0430\u0441, \u0442\u043e \u0438 \u043e\u043a\u043d\u043e \u0434\u043e\u043b\u0436\u043d\u043e \u0431\u044b\u0442\u044c \u043f\u0440\u0438\u043c\u0435\u0440\u043d\u043e \u0441 \u0442\u0430\u043a\u0438\u043c \u0436\u0435 \u0438\u043d\u0442\u0435\u0440\u0432\u0430\u043b\u043e\u043c. \u041d\u0435\u0442 \u043e\u0441\u043e\u0431\u043e\u0433\u043e \u0441\u043c\u044b\u0441\u043b\u0430 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0442\u044c \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0438 \u043d\u0430 \u043f\u043e\u0442\u043e\u043a\u043e\u0432\u044b\u043c \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u043e\u043c \u043a\u0430\u0436\u0434\u044b\u0435 10 \u043c\u0438\u043d\u0443\u0442, \u0435\u0441\u043b\u0438 \u0443 \u0432\u0430\u0441 \u043d\u0435\u0442 \u0440\u0435\u0441\u0443\u0440\u0441\u043e\u0432 \u043d\u0430 \u043d\u0438\u0445 \u0440\u0435\u0430\u0433\u0438\u0440\u043e\u0432\u0430\u0442\u044c.
"},{"location":"ru/03-job-configuration/","title":"Job Configuration","text":"tbd.
"},{"location":"ru/03-job-configuration/01-Connections/","title":"\u041a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0439 (Connections)","text":"\u0424\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a Checkita \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u0441\u043e\u0437\u0434\u0430\u0432\u0430\u0442\u044c \u0447\u0438\u0442\u0430\u0442\u044c \u0434\u0430\u043d\u043d\u044b\u0435 \u0438\u0437 \u0432\u043d\u0435\u0448\u043d\u0438\u0445 \u0441\u0438\u0441\u0442\u0435\u043c, \u0442\u0430\u043a\u0438\u0445 \u043a\u0430\u043a \u0440\u0435\u043b\u044f\u0446\u0438\u043e\u043d\u043d\u044b\u0435 \u0421\u0423\u0411\u0414 \u0438\u043b\u0438 \u0431\u0440\u043e\u043a\u0435\u0440\u044b \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u0439 (Kafka). \u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u0447\u0438\u0442\u0430\u0442\u044c \u0434\u0430\u043d\u043d\u044b\u0435 \u0438\u0437 \u0432\u043d\u0435\u0448\u043d\u0438\u0445 \u0441\u0438\u0441\u0442\u0435\u043c, \u043d\u0443\u0436\u043d\u043e \u043d\u0430\u0441\u0442\u0440\u043e\u0438\u0442\u044c \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0435 \u043a \u043d\u0438\u043c.
\u0422\u0430\u043a, \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0439 \u043a \u0432\u043d\u0435\u0448\u043d\u0438\u043c \u0441\u0438\u0441\u0442\u0435\u043c\u0430\u043c \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u0432 \u0440\u0430\u0437\u0434\u0435\u043b\u0435 connections
\u0444\u0430\u0439\u043b\u0430 \u0441 \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430\u043c\u0438 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430. \u041d\u0430 \u0442\u0435\u043a\u0443\u0449\u0438\u0439 \u043c\u043e\u043c\u0435\u043d\u0442 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u044e\u0442\u0441\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u043c \u0441\u0438\u0441\u0442\u0435\u043c\u0430\u043c:
\u0412\u0441\u0435 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u0434\u043e\u043b\u0436\u043d\u044b \u0438\u043c\u0435\u0442\u044c \u0443\u043d\u0438\u043a\u0430\u043b\u044c\u043d\u044b\u0439 \u0438\u0434\u0435\u043d\u0442\u0438\u0444\u0438\u043a\u0430\u0442\u043e\u0440 id
, \u0430 \u0442\u0430\u043a\u0436\u0435 \u043c\u043e\u0433\u0443\u0442 \u0438\u043c\u0435\u0442\u044c \u043e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a Spark-\u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432. \u042d\u0442\u0438 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u0432 \u043f\u043e\u043b\u0435 parameters
\u0438 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0442\u0441\u044f \u0434\u043b\u044f \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0442\u0441\u044f Spark'\u043e\u043c, \u0447\u0442\u043e\u0431\u044b \u0447\u0438\u0442\u0430\u0442\u044c \u0434\u0430\u043d\u043d\u044b\u0435 \u0438\u0437 \u0434\u0430\u043d\u043d\u043e\u0439 \u0441\u0438\u0441\u0442\u0435\u043c\u044b.
\u041a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u044f \u0432\u0441\u0435\u0445 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0439 \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u043c\u0438 \u043e\u0431\u0449\u0438\u043c\u0438 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u0430\u043c\u0438:
id
- \u0418\u0434\u0435\u043d\u0442\u0438\u0444\u0438\u043a\u0430\u0442\u043e\u0440 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e\u0435 \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f;parameters
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a Spark-\u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0443\u043a\u0430\u0437\u0430\u043d\u044b, \u0447\u0442\u043e\u0431\u044b \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0438\u0442\u044c \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u0443\u044e \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u044e \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a\u0430\u043a \u044d\u0442\u043e\u0433\u043e \u0442\u0440\u0435\u0431\u0443\u0435\u0442 Spark.metadata
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0445 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c.\u0421\u043f\u0435\u0446\u0438\u0444\u0438\u0447\u043d\u044b\u0435 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b \u0443\u043a\u0430\u0437\u0430\u043d\u044b \u043d\u0438\u0436\u0435 \u043e\u0442\u0434\u0435\u043b\u044c\u043d\u043e \u0434\u043b\u044f \u043a\u0430\u0436\u0434\u043e\u0433\u043e \u0438\u0437 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u043c\u044b\u0445 \u0442\u0438\u043f\u043e\u0432 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0439.
\u041f\u0440\u0438\u043c\u0435\u0440 \u0437\u0430\u043f\u043e\u043b\u043d\u0435\u043d\u043d\u043e\u0433\u043e \u0440\u0430\u0437\u0434\u0435\u043b\u0430 connections
\u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d \u043d\u0438\u0436\u0435 \u0432 \u0433\u043b\u0430\u0432\u0435 \u041f\u0440\u0438\u043c\u0435\u0440 \u041e\u043f\u0438\u0441\u0430\u043d\u0438\u044f \u041f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0439.
\u041d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0411\u0414 SQLite \u0434\u043e\u0441\u0442\u0430\u0442\u043e\u0447\u043d\u043e \u043f\u0440\u043e\u0441\u0442\u0430. \u0414\u043e\u0441\u0442\u0430\u0442\u043e\u0447\u043d\u043e \u0443\u043a\u0430\u0437\u0430\u0442\u044c \u0442\u043e\u043b\u044c\u043a\u043e \u0434\u0432\u0430 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u0430:
id
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. ID \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f;url
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u041f\u0443\u0442\u044c \u0434\u043e \u0444\u0430\u0439\u043b\u0430 \u0441 \u0431\u0430\u0437\u043e\u0439 SQLite.parameters
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0421\u043f\u0438\u0441\u043e\u043a Spark-\u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f), \u0433\u0434\u0435 \u043a\u0430\u0436\u0434\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440 - \u044d\u0442\u043e \u0441\u0442\u0440\u043e\u043a\u0430 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: spark.param.name=spark.param.value
.metadata
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0434\u043b\u044f \u044d\u0442\u043e\u0433\u043e \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: param.name=param.value
.\u041d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a PostgreSQL \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u043e\u043f\u0438\u0441\u0430\u043d\u0430 \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0445 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432:
id
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. ID \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f;url
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. URL \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0441\u0435\u0440\u0432\u0435\u0440\u0443 PostgreSQL. URL \u0434\u043e\u043b\u0436\u0435\u043d \u0441\u043e\u0434\u0435\u0440\u0436\u0430\u0442\u044c \u0430\u0434\u0440\u0435\u0441 \u0441\u0435\u0440\u0432\u0435\u0440\u0430, \u043f\u043e\u0440\u0442 \u0438 \u0438\u043c\u044f \u0411\u0414 \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f. \u0414\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0435 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b \u0442\u0430\u043a\u0436\u0435 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0443\u043a\u0430\u0437\u0430\u043d\u044b \u0432 URL, \u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f, \u0432 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0438\u0438 \u0441 \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u0430\u0446\u0438\u0435\u0439 PostgreSQL. URL \u043d\u0435 \u0434\u043e\u043b\u0436\u0435\u043d \u0441\u043e\u0434\u0435\u0440\u0436\u0430\u0442\u044c \u043f\u0440\u043e\u0442\u043e\u043a\u043e\u043b \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f.username
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0418\u043c\u044f \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044f \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0431\u0430\u0437\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f).password
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041f\u0430\u0440\u043e\u043b\u044c \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0431\u0430\u0437\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f).parameters
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0421\u043f\u0438\u0441\u043e\u043a Spark-\u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f), \u0433\u0434\u0435 \u043a\u0430\u0436\u0434\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440 - \u044d\u0442\u043e \u0441\u0442\u0440\u043e\u043a\u0430 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: spark.param.name=spark.param.value
.metadata
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0434\u043b\u044f \u044d\u0442\u043e\u0433\u043e \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: param.name=param.value
.\u041d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0430 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a Oracle \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u0442\u0430\u043a \u0436\u0435 \u043a\u0430\u043a \u0438 \u0434\u043b\u044f PostgreSQL, \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0445 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432:
id
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. ID \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f;url
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. URL \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0441\u0435\u0440\u0432\u0435\u0440\u0443 Oracle. URL \u0434\u043e\u043b\u0436\u0435\u043d \u0441\u043e\u0434\u0435\u0440\u0436\u0430\u0442\u044c \u0430\u0434\u0440\u0435\u0441 \u0441\u0435\u0440\u0432\u0435\u0440\u0430, \u043f\u043e\u0440\u0442 \u0438 \u0438\u043c\u044f \u0411\u0414 \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f. \u0414\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0435 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b \u0442\u0430\u043a\u0436\u0435 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0443\u043a\u0430\u0437\u0430\u043d\u044b \u0432 URL, \u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f, \u0432 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0438\u0438 \u0441 \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u0430\u0446\u0438\u0435\u0439 Oracle. URL \u043d\u0435 \u0434\u043e\u043b\u0436\u0435\u043d \u0441\u043e\u0434\u0435\u0440\u0436\u0430\u0442\u044c \u043f\u0440\u043e\u0442\u043e\u043a\u043e\u043b \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f.username
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0418\u043c\u044f \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044f \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0431\u0430\u0437\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f).password
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041f\u0430\u0440\u043e\u043b\u044c \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043a \u0431\u0430\u0437\u0435 \u0434\u0430\u043d\u043d\u044b\u0445 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f).parameters
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0421\u043f\u0438\u0441\u043e\u043a Spark-\u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f), \u0433\u0434\u0435 \u043a\u0430\u0436\u0434\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440 - \u044d\u0442\u043e \u0441\u0442\u0440\u043e\u043a\u0430 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: spark.param.name=spark.param.value
.metadata
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0434\u043b\u044f \u044d\u0442\u043e\u0433\u043e \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: param.name=param.value
.\u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u043d\u0430\u0441\u0442\u0440\u043e\u0438\u0442\u044c \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0435 \u043a \u043a\u043b\u0430\u0441\u0442\u0435\u0440\u0443 Kafka, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0443\u043a\u0430\u0437\u0430\u0442\u044c \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0435 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b:
id
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. ID \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f;servers
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0421\u043f\u0438\u0441\u043e\u043a \u0441\u0435\u0440\u0432\u0435\u0440\u043e\u0432 (\u0431\u0440\u043e\u043a\u0435\u0440\u043e\u0432 \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u0439) \u0434\u043b\u044f \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f.parameters
- Optional. \u0421\u043f\u0438\u0441\u043e\u043a Spark-\u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432 (\u0435\u0441\u043b\u0438 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f), \u0433\u0434\u0435 \u043a\u0430\u0436\u0434\u044b\u0439 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440 - \u044d\u0442\u043e \u0441\u0442\u0440\u043e\u043a\u0430 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: spark.param.name=spark.param.value
. \u041e\u0431\u044b\u0447\u043d\u043e, \u043d\u0430\u0441\u0442\u0440\u043e\u0439\u043a\u0438 \u0430\u0432\u0442\u043e\u0440\u0438\u0437\u0430\u0446\u0438\u0438 \u0432 Kafka \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u043b\u044f\u044e\u0442\u0441\u044f \u043a\u0430\u043a \u0441\u043f\u0438\u0441\u043e\u043a Spark-\u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432.metadata
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0434\u043b\u044f \u044d\u0442\u043e\u0433\u043e \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: param.name=param.value
.\u0415\u0441\u043b\u0438 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0435 \u043a \u043a\u043b\u0430\u0441\u0442\u0435\u0440\u0443 Kafka \u0442\u0440\u0435\u0431\u0443\u0435\u0442 \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u043e\u043d\u043d\u043e\u0433\u043e \u0444\u0430\u0439\u043b\u0430 JAAS, \u0442\u043e \u0435\u0433\u043e \u0440\u0430\u0441\u043f\u043e\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u0434\u043e\u043b\u0436\u043d\u043e \u0431\u044b\u0442\u044c \u0443\u043a\u0430\u0437\u0430\u043d\u043e \u0432 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u043e\u043a\u0440\u0443\u0436\u0435\u043d\u0438\u044f Java. \u0412\u0430\u0436\u043d\u043e \u0437\u0430\u043c\u0435\u0442\u0438\u0442\u044c, \u0447\u0442\u043e \u044d\u0442\u0438 \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u0443\u0441\u0442\u0430\u043d\u043e\u0432\u043b\u0435\u043d\u044b \u0434\u043e \u0442\u043e\u0433\u043e, \u043a\u0430\u043a JVM \u0431\u0443\u0434\u0435\u0442 \u0437\u0430\u043f\u0443\u0449\u0435\u043d\u0430. \u041f\u043e\u044d\u0442\u043e\u043c\u0443 \u043e\u043d\u0438 \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u044b \u0432 \u043a\u043e\u043c\u0430\u043d\u0434\u0435 spark-submit
\u043a\u0430\u043a \u0443\u043a\u0430\u0437\u0430\u043d\u043e \u043d\u0438\u0436\u0435:
cluster
\u0440\u0435\u0436\u0438\u043c\u0435: --deploy-mode cluster \\\n--conf 'spark.driver.extraJavaOptions=\"-Djava.security.auth.login.config=./jaas.conf\"' \\\n--conf 'spark.executor.extraJavaOptions=\"-Djava.security.auth.login.config=./jaas.conf\"' \\\n--files /path/to/your/jaas.conf,<other files required for DQ>\n
client
\u0440\u0435\u0436\u0438\u043c\u0435, \u0442\u043e JVM \u043d\u0430 \u043a\u043b\u0438\u0435\u043d\u0442\u0435 (\u0434\u0440\u0430\u0439\u0432\u0435\u0440) \u0441\u0442\u0430\u0440\u0442\u0443\u0435\u0442 \u0434\u043e \u0442\u043e\u0433\u043e, \u043a\u0430\u043a \u0441\u0447\u0438\u0442\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 Spark-\u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f. \u041f\u043e\u044d\u0442\u043e\u043c\u0443, \u043f\u0435\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u043e\u043a\u0440\u0443\u0436\u0435\u043d\u0438\u044f Java \u0434\u043b\u044f \u0434\u0440\u0430\u0439\u0432\u0435\u0440\u0430 \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u0437\u0430\u0434\u0430\u043d\u044b \u043f\u043e\u0441\u0440\u0435\u0434\u0441\u0442\u0432\u043e\u043c \u0430\u0440\u0433\u0443\u043c\u0435\u043d\u0442\u0430 --driver-java-options
: --deploy-mode client \\\n--driver-java-options \"-Djava.security.auth.login.config=.jaas.conf\" \\\n--conf 'spark.executor.extraJavaOptions=\"-Djava.security.auth.login.config=./jaas.conf\"' \\\n--files file.keytab,jaas.conf,<other files required for DQ>\n
\u041a\u0430\u043a \u043f\u043e\u043a\u0430\u0437\u0430\u043d\u043e \u0432 \u043f\u0440\u0438\u043c\u0435\u0440\u0435 \u043d\u0438\u0436\u0435, \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f \u043e\u0434\u043d\u043e\u0433\u043e \u0442\u0438\u043f\u0430 \u0441\u0433\u0440\u0443\u043f\u043f\u0438\u0440\u043e\u0432\u0430\u043d\u044b \u0432 \u043f\u043e\u0434\u0440\u0430\u0437\u0434\u0435\u043b\u044b, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0438\u043c\u0435\u043d\u0443\u044e\u0442\u0441\u044f \u0432 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0438\u0438 \u0441 \u0442\u0438\u043f\u043e\u043c \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u044f. \u042d\u0442\u0438 \u0440\u0430\u0437\u0434\u0435\u043b\u044b \u0441\u043e\u0434\u0435\u0440\u0436\u0430\u0442 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0439 \u0442\u043e\u043b\u044c\u043a\u043e \u0434\u0430\u043d\u043d\u043e\u0433\u043e \u0442\u0438\u043f\u0430.
jobConfig: {\n connections: {\n postgres: [\n {id: \"postgre_db1\", url: \"postgre1.db.com:5432/public\", username: \"dq-user\", password: \"dq-password\"}\n {\n id: \"postgre_db2\",\n url: \"postgre2.db.com:5432/public\",\n username: \"dq-user\",\n password: \"dq-password\",\n schema: \"dataquality\"\n }\n ]\n oracle: [\n {id: \"oracle_db1\", url: \"oracle.db.com:1521/public\", username: \"db-user\", password: \"dq-password\"}\n ]\n sqlite: [\n {id: \"sqlite_db\", url: \"some/path/to/db.sqlite\"}\n ],\n kafka: [\n {id: \"kafka_cluster_1\", servers: [\"server1:9092\", \"server2:9092\"]}\n {\n id: \"kafka_cluster_2\",\n servers: [\"kafka-broker1:9092\", \"kafka-broker2:9092\", \"kafka-broker3:9092\"]\n parameters: [\n \"security.protocol=SASL_PLAINTEXT\",\n \"sasl.mechanism=GSSAPI\",\n \"sasl.kerberos.service.name=kafka-service\"\n ]\n }\n ]\n }\n}\n
"},{"location":"ru/03-job-configuration/02-Schemas/","title":"\u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0421\u0445\u0435\u043c \u0414\u0430\u043d\u043d\u044b\u0445 (Schemas)","text":"\u0421\u0445\u0435\u043c\u044b \u0434\u0430\u043d\u043d\u044b\u0445 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0442\u0441\u044f \u0432 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430\u0445 \u0434\u043b\u044f \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0445 \u0446\u0435\u043b\u0435\u0439:
schemaMatch
(\u0441\u043c. \u0433\u043b\u0430\u0432\u0443 Schema Match Check)\u0421\u0445\u0435\u043c\u0430 \u0434\u0430\u043d\u043d\u044b\u0445 \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u0432 \u0440\u0430\u0437\u0434\u0435\u043b\u0435 schemas
\u0444\u0430\u0439\u043b\u0430 \u0441 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0435\u0439 Data Quality \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u0430. \u0421\u0445\u0435\u043c\u044b \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u043e\u043f\u0438\u0441\u0430\u043d\u044b \u0432 \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0445 \u0444\u043e\u0440\u043c\u0430\u0442\u0430\u0445. \u0424\u043e\u0440\u043c\u0430\u0442, \u0432 \u043a\u043e\u0442\u043e\u0440\u043e\u043c \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u0441\u0445\u0435\u043c\u0430, \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u0432 \u043f\u043e\u043b\u0435 kind
, \u0438 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u0442 \u0434\u0440\u0443\u0433\u0438\u0435 \u043f\u043e\u043b\u044f, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u0437\u0430\u043f\u043e\u043b\u043d\u0435\u043d\u044b.
\u041f\u043e\u043c\u0438\u043c\u043e \u043f\u043e\u043b\u044f kind
, \u0441\u0445\u0435\u043c\u044b \u0432\u0441\u0435\u0445 \u0444\u043e\u0440\u043c\u0430\u0442\u043e\u0432 \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0445 \u043e\u0431\u0449\u0438\u0445 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432:
id
- \u0418\u0434\u0435\u043d\u0442\u0438\u0444\u0438\u043a\u0430\u0442\u043e\u0440 \u0441\u0445\u0435\u043c\u044b;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e\u0435 \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0441\u0445\u0435\u043c\u044b;metadata
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0445 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c.\u0414\u0430\u043d\u043d\u044b\u0439 \u0442\u0438\u043f \u0441\u0445\u0435\u043c \u0432 \u043e\u0441\u043d\u043e\u0432\u043d\u043e\u043c \u043f\u0440\u0435\u0434\u043d\u0430\u0437\u043d\u0430\u0447\u0435\u043d \u0434\u043b\u044f \u0447\u0442\u0435\u043d\u0438\u044f \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u044b\u0445 \u0444\u0430\u0439\u043b\u043e\u0432 \u0441 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u0435\u043c, \u0442\u0430\u043a\u0438\u0445 CSV \u0438\u043b\u0438 TSV. \u0422\u0435\u043c \u043d\u0435 \u043c\u0435\u043d\u0435\u0435 \u044d\u0442\u0438 \u0441\u0445\u0435\u043c\u044b \u0442\u0430\u043a\u0436\u0435 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u044b \u0432 \u0437\u0430\u0433\u0440\u0443\u0437\u043e\u0447\u043d\u044b\u0445 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0430\u0445 schemaMatch
. \u0414\u0430\u043d\u043d\u044b\u0445 \u0442\u0438\u043f \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 \u0441\u0445\u0435\u043c\u044b \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u0442\u044c \u0442\u043e\u043b\u044c\u043a\u043e \u043f\u043b\u043e\u0441\u043a\u0438\u0435 \u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u044b (\u0432\u043b\u043e\u0436\u0435\u043d\u043d\u044b\u0435 \u043a\u043e\u043b\u043e\u043d\u043a\u0438 \u043d\u0435 \u0434\u043e\u043f\u0443\u0441\u043a\u0430\u044e\u0442\u0441\u044f).
\u0418\u0442\u0430\u043a, \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0441\u0445\u0435\u043c\u044b \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u044b\u0445 \u0444\u0430\u0439\u043b\u043e\u0432 \u0441 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u0435\u043c \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u0434\u0438\u0442\u0441\u044f \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0445 \u043f\u043e\u043b\u0435\u0439:
kind: \"delimited\"
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0423\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442 \u0444\u043e\u0440\u043c\u0430\u0442 \u0441\u0445\u0435\u043c\u044b \u0434\u043b\u044f \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u044b\u0445 \u0444\u0430\u0439\u043b\u043e\u0432 \u0441 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u0435\u043c;id
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. ID \u0441\u0445\u0435\u043c\u044b;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0441\u0445\u0435\u043c\u044b;schema
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0421\u043f\u0438\u0441\u043e\u043a \u043a\u043e\u043b\u043e\u043d\u043e\u043a, \u0433\u0434\u0435 \u043a\u0430\u0436\u0434\u0430\u044f \u043a\u043e\u043b\u043e\u043d\u043a\u0430 - \u044d\u0442\u043e \u043e\u0431\u044a\u0435\u043a\u0442 \u0441\u043e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u043c \u043f\u043e\u043b\u044f\u043c\u0438:name
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0418\u043c\u044f \u043a\u043e\u043b\u043e\u043d\u043a\u0438;type
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0422\u0438\u043f \u043a\u043e\u043b\u043e\u043d\u043a\u0438. \u0421\u043f\u0438\u0441\u043e\u043a \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u043c\u044b\u0445 \u0442\u0438\u043f\u043e\u0432 \u0434\u0430\u043d \u0432 \u0433\u043b\u0430\u0432\u0435 \u041f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u043c\u044b\u0435 \u041d\u0430\u0438\u043c\u0435\u043d\u043e\u0432\u0430\u043d\u0438\u044f \u0422\u0438\u043f\u043e\u0432.metadata
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0434\u043b\u044f \u044d\u0442\u043e\u0439 \u0441\u0445\u0435\u043c\u044b \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: param.name=param.value
.\u0414\u0430\u043d\u043d\u044b\u0439 \u0442\u0438\u043f \u0441\u0445\u0435\u043c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u0442\u043e\u0433\u043e, \u0447\u0442\u043e\u0431\u044b \u0447\u0438\u0442\u0430\u0442\u044c \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u044b\u0435 \u0444\u0430\u0439\u043b\u044b \u0431\u0435\u0437 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u044f: \u0441 \u0444\u0438\u043a\u0441\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u043e\u0439 \u0448\u0438\u0440\u0438\u043d\u043e\u0439 \u043a\u043e\u043b\u043e\u043d\u043a\u0438. \u041e\u0441\u043d\u043e\u0432\u043d\u043e\u0435 \u043e\u0442\u043b\u0438\u0447\u0438\u0435 \u043e\u0442 \u0441\u0445\u0435\u043c \u0434\u0440\u0443\u0433\u0438\u0445 \u0442\u0438\u043f\u043e\u0432 - \u044d\u0442\u043e \u0443\u043a\u0430\u0437\u0430\u043d\u0438\u0435 \u0448\u0438\u0440\u0438\u043d\u044b \u043a\u0430\u0436\u0434\u043e\u0439 \u043a\u043e\u043b\u043e\u043d\u043a\u0438 (\u0447\u0438\u0441\u043b\u043e \u0441\u0438\u043c\u0432\u043e\u043b\u043e\u0432), \u043a\u043e\u0442\u043e\u0440\u043e\u0435 \u043a\u0440\u0438\u0442\u0438\u0447\u043d\u043e \u0434\u043b\u044f \u0442\u043e\u0433\u043e, \u0447\u0442\u043e\u0431\u044b \u043f\u0440\u0430\u0432\u0438\u043b\u044c\u043d\u043e \u0441\u0447\u0438\u0442\u044b\u0432\u0430\u0442\u044c \u0441\u043e\u0434\u0435\u0440\u0436\u0438\u043c\u043e\u0435 \u0442\u0430\u043a\u0438\u0445 \u0444\u0430\u0439\u043b\u043e\u0432. \u041d\u0435\u0441\u043c\u043e\u0442\u0440\u044f \u043d\u0430 \u0441\u043f\u0435\u0446\u0438\u0444\u0438\u043a\u0443, \u044d\u0442\u043e\u0442 \u0442\u0438\u043f \u0441\u0445\u0435\u043c \u0442\u0430\u043a\u0436\u0435 \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d \u0434\u043b\u044f \u0447\u0442\u0435\u043d\u0438\u044f \u0444\u0430\u0439\u043b\u043e\u0432 \u0441 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u0435\u043c \u0438 \u0434\u043b\u044f \u0443\u043a\u0430\u0437\u0430\u043d\u0438\u044f \u0432 \u0437\u0430\u0433\u0440\u0443\u0437\u043e\u0447\u043d\u044b\u0445 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0430\u0445 schemaMatch
. \u0414\u0430\u043d\u043d\u044b\u0445 \u0442\u0438\u043f \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 \u0441\u0445\u0435\u043c\u044b \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u0442\u044c \u0442\u043e\u043b\u044c\u043a\u043e \u043f\u043b\u043e\u0441\u043a\u0438\u0435 \u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u044b (\u0432\u043b\u043e\u0436\u0435\u043d\u043d\u044b\u0435 \u043a\u043e\u043b\u043e\u043d\u043a\u0438 \u043d\u0435 \u0434\u043e\u043f\u0443\u0441\u043a\u0430\u044e\u0442\u0441\u044f).
\u0418\u0442\u0430\u043a, \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0441\u0445\u0435\u043c\u044b \u0444\u0430\u0439\u043b\u043e\u0432 \u0441 \u0444\u0438\u043a\u0441\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u043e\u0439 \u0448\u0438\u0440\u0438\u043d\u043e\u0439 \u043a\u043e\u043b\u043e\u043d\u043a\u0438 \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u0434\u0438\u0442\u0441\u044f \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0445 \u043f\u043e\u043b\u0435\u0439:
kind: \"fixedFull\"
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0423\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442 \u0444\u043e\u0440\u043c\u0430\u0442 \u0441\u0445\u0435\u043c\u044b \u0434\u043b\u044f \u0442\u0435\u0441\u0442\u043e\u0432\u044b\u0445 \u0444\u0430\u0439\u043b\u043e\u0432 \u0431\u0435\u0437 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u044f.id
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. ID \u0441\u0445\u0435\u043c\u044b;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0441\u0445\u0435\u043c\u044b;schema
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0421\u043f\u0438\u0441\u043e\u043a \u043a\u043e\u043b\u043e\u043d\u043e\u043a, \u0433\u0434\u0435 \u043a\u0430\u0436\u0434\u0430\u044f \u043a\u043e\u043b\u043e\u043d\u043a\u0430 - \u044d\u0442\u043e \u043e\u0431\u044a\u0435\u043a\u0442 \u0441\u043e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u043c \u043f\u043e\u043b\u044f\u043c\u0438:name
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0418\u043c\u044f \u043a\u043e\u043b\u043e\u043d\u043a\u0438;type
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0422\u0438\u043f \u043a\u043e\u043b\u043e\u043d\u043a\u0438. \u0421\u043f\u0438\u0441\u043e\u043a \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u043c\u044b\u0445 \u0442\u0438\u043f\u043e\u0432 \u0434\u0430\u043d \u0432 \u0433\u043b\u0430\u0432\u0435 \u041f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u043c\u044b\u0435 \u041d\u0430\u0438\u043c\u0435\u043d\u043e\u0432\u0430\u043d\u0438\u044f \u0422\u0438\u043f\u043e\u0432.width
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0428\u0438\u0440\u0438\u043d\u0430 \u043a\u043e\u043b\u043e\u043d\u043a\u0438 (\u0446\u0435\u043b\u043e\u0435 \u0447\u0438\u0441\u043b\u043e \u0441\u0438\u043c\u0432\u043e\u043b\u043e\u0432).metadata
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0434\u043b\u044f \u044d\u0442\u043e\u0439 \u0441\u0445\u0435\u043c\u044b \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: param.name=param.value
.\u0414\u0430\u043d\u043d\u044b\u0435 \u0442\u0438\u043f \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u043b\u044f\u0435\u0442 \u0431\u043e\u043b\u0435\u0435 \u043f\u0440\u043e\u0441\u0442\u043e\u0439 \u0441\u043f\u043e\u0441\u043e\u0431 \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u0442\u044c \u0441\u0445\u0435\u043c\u044b \u0434\u043b\u044f \u0444\u0430\u0439\u043b\u043e\u0432 \u0431\u0435\u0437 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u044f. \u0412 \u0434\u0430\u043d\u043d\u043e\u043c \u0441\u043b\u0443\u0447\u0430\u0435 \u043a\u043e\u043b\u043e\u043d\u043a\u0438 \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u0441 \u0443\u043a\u0430\u0437\u0430\u043d\u0438\u0435\u043c \u0442\u043e\u043b\u044c\u043a\u043e \u043b\u0438\u0448\u044c \u0438\u0445 \u0438\u043c\u0435\u043d\u0438 \u0438 \u0448\u0438\u0440\u0438\u043d\u044b. \u0421\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0435\u043d\u043d\u043e, \u0432\u0441\u0435 \u043a\u043e\u043b\u043e\u043d\u043a\u0438 \u0431\u0443\u0434\u0443\u0442 \u0438\u043c\u0435\u0442\u044c \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u044b\u0439 \u0442\u0438\u043f \u0434\u0430\u043d\u043d\u044b\u0445. \u0414\u0430\u043d\u043d\u044b\u0439 \u0442\u0438\u043f \u0441\u0445\u0435\u043c \u0442\u0430\u043a\u0436\u0435 \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d \u0434\u043b\u044f \u0447\u0442\u0435\u043d\u0438\u044f \u0444\u0430\u0439\u043b\u043e\u0432 \u0441 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u0435\u043c \u0438 \u0434\u043b\u044f \u0443\u043a\u0430\u0437\u0430\u043d\u0438\u044f \u0432 \u0437\u0430\u0433\u0440\u0443\u0437\u043e\u0447\u043d\u044b\u0445 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0430\u0445 schemaMatch
. \u0414\u0430\u043d\u043d\u044b\u0445 \u0442\u0438\u043f \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 \u0441\u0445\u0435\u043c\u044b \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u043e\u043f\u0438\u0441\u044b\u0432\u0430\u0442\u044c \u0442\u043e\u043b\u044c\u043a\u043e \u043f\u043b\u043e\u0441\u043a\u0438\u0435 \u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u044b (\u0432\u043b\u043e\u0436\u0435\u043d\u043d\u044b\u0435 \u043a\u043e\u043b\u043e\u043d\u043a\u0438 \u043d\u0435 \u0434\u043e\u043f\u0443\u0441\u043a\u0430\u044e\u0442\u0441\u044f).
\u0418\u0442\u0430\u043a, \u0443\u043f\u0440\u043e\u0449\u0435\u043d\u043d\u043e\u0435 \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0441\u0445\u0435\u043c\u044b \u0444\u0430\u0439\u043b\u043e\u0432 \u0431\u0435\u0437 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u044f \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u0434\u0438\u0442\u0441\u044f \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0445 \u043f\u043e\u043b\u0435\u0439:
kind: \"fixedShort\"
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0423\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442 \u0443\u043f\u0440\u043e\u0449\u0435\u043d\u043d\u044b\u0439 \u0444\u043e\u0440\u043c\u0430\u0442 \u0441\u0445\u0435\u043c\u044b \u0434\u043b\u044f \u0442\u0435\u0441\u0442\u043e\u0432\u044b\u0445 \u0444\u0430\u0439\u043b\u043e\u0432 \u0431\u0435\u0437 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u044f.id
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. ID \u0441\u0445\u0435\u043c\u044b;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0441\u0445\u0435\u043c\u044b;schema
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0421\u043f\u0438\u0441\u043e\u043a \u043a\u043e\u043b\u043e\u043d\u043e\u043a, \u0433\u0434\u0435 \u043a\u0430\u0436\u0434\u0430\u044f \u043a\u043e\u043b\u043e\u043d\u043a\u0430 \u044d\u0442\u043e \u0441\u0442\u0440\u043e\u043a\u0430 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: columnName:columnWidth
. \u041a\u043e\u043b\u043e\u043d\u043a\u0438 \u0432\u0441\u0435\u0433\u0434\u0430 \u0438\u043c\u0435\u044e\u0442 \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u044b\u0439 \u0442\u0438\u043f.metadata
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0434\u043b\u044f \u044d\u0442\u043e\u0439 \u0441\u0445\u0435\u043c\u044b \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: param.name=param.value
.\u0414\u0430\u043d\u043d\u044b\u0439 \u0442\u0438\u043f \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u0442\u043e\u0433\u043e, \u0447\u0442\u043e\u0431\u044b \u0447\u0438\u0442\u0430\u0442\u044c Avro \u0441\u0445\u0435\u043c\u044b \u0438\u0437 \u0444\u0430\u0439\u043b\u043e\u0432, \u0440\u0430\u0441\u0448\u0438\u0440\u0435\u043d\u0438\u0435\u043c .avcs
. \u0422\u0430\u043a, \u0441\u0445\u0435\u043c\u0430, \u0441\u0447\u0438\u0442\u0430\u043d\u043d\u0430\u044f \u0438\u0437 \u0442\u0430\u043a\u043e\u0433\u043e \u0444\u0430\u0439\u043b\u0430, \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0430 \u043a\u0430\u043a \u0434\u043b\u044f \u0447\u0442\u0435\u043d\u0438\u044f Avro-\u0444\u0430\u0439\u043b\u043e\u0432, \u0442\u0430\u043a \u0438 \u0434\u043b\u044f \u0447\u0442\u0435\u043d\u0438\u044f \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u044b\u0445 \u0444\u0430\u0439\u043b\u043e\u0432 \u0441 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u0435\u043c. \u0422\u0430\u043a\u0436\u0435, \u044d\u0442\u0438 \u0441\u0445\u0435\u043c\u044b \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u044b \u0432 \u0437\u0430\u0433\u0440\u0443\u0437\u043e\u0447\u043d\u044b\u0445 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0430\u0445 schemaMatch
. \u0412 \u0434\u043e\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u0435 \u0441\u0442\u043e\u0438\u0442 \u043e\u0442\u043c\u0435\u0442\u0438\u0442\u044c, \u0447\u0442\u043e Avro \u0441\u0445\u0435\u043c\u044b \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u044e\u0442 \u0441\u043b\u043e\u0436\u043d\u044b\u0435 \u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u044b \u0441\u043e \u0432\u043b\u043e\u0436\u0435\u043d\u043d\u044b\u043c\u0438 \u043a\u043e\u043b\u043e\u043d\u043a\u0430\u043c\u0438.
\u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u043f\u0440\u043e\u0447\u0438\u0442\u0430\u0442\u044c Avro \u0441\u0445\u0435\u043c\u0443 \u0438\u0437 \u0444\u0430\u0439\u043b\u0430, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0443\u043a\u0430\u0437\u0430\u0442\u044c \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0435 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b:
kind: \"avro\"
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0423\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442 \u0444\u043e\u0440\u043c\u0430\u0442 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 Avro \u0441\u0445\u0435\u043c\u044b.id
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. ID \u0441\u0445\u0435\u043c\u044b;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0441\u0445\u0435\u043c\u044b;schema
- Required. \u041f\u0443\u0442\u044c \u0434\u043e .avsc
\u0444\u0430\u0439\u043b\u0430 \u0438\u0437 \u043a\u043e\u0442\u043e\u0440\u043e\u0433\u043e \u0431\u0443\u0434\u0435\u0442 \u0441\u0447\u0438\u0442\u0430\u043d\u0430 Avro-\u0441\u0445\u0435\u043c\u0430.metadata
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041f\u0440\u043e\u0438\u0437\u0432\u043e\u043b\u044c\u043d\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u043e\u0432, \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0434\u043b\u044f \u044d\u0442\u043e\u0439 \u0441\u0445\u0435\u043c\u044b \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435: param.name=param.value
.\u041a\u0430\u0442\u0430\u043b\u043e\u0433 Hive \u0442\u0430\u043a\u0436\u0435 \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d \u043a\u0430\u043a \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a \u0441\u0445\u0435\u043c \u0434\u0430\u043d\u043d\u044b\u0445. \u0422\u0430\u043a, Hive \u0444\u043e\u0440\u043c\u0430\u0442 \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u044f \u0441\u0445\u0435\u043c \u043f\u0440\u0435\u0434\u043d\u0430\u0437\u043d\u0430\u0447\u0435\u043d \u0434\u043b\u044f \u0442\u043e\u0433\u043e, \u0447\u0442\u043e\u0431\u044b \u043f\u043e\u043b\u0443\u0447\u0430\u0442\u044c \u0441\u0445\u0435\u043c\u044b \u0434\u0430\u043d\u043d\u044b\u0445, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0443\u044e\u0442 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u043d\u044b\u043c Hive \u0442\u0430\u0431\u043b\u0438\u0446\u0430\u043c. \u042d\u0442\u0438 \u0441\u0445\u0435\u043c\u044b \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u044b \u043a\u0430\u043a \u0434\u043b\u044f \u0447\u0442\u0435\u043d\u0438\u044f Avro-\u0444\u0430\u0439\u043b\u043e\u0432, \u0442\u0430\u043a \u0438 \u0434\u043b\u044f \u0447\u0442\u0435\u043d\u0438\u044f \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u044b\u0445 \u0444\u0430\u0439\u043b\u043e\u0432 \u0441 \u0440\u0430\u0437\u0434\u0435\u043b\u0438\u0442\u0435\u043b\u0435\u043c. \u0422\u0430\u043a\u0436\u0435, \u044d\u0442\u0438 \u0441\u0445\u0435\u043c\u044b \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u044b \u0432 \u0437\u0430\u0433\u0440\u0443\u0437\u043e\u0447\u043d\u044b\u0445 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0430\u0445 schemaMatch
. \u0412 \u0434\u043e\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u0435 \u0441\u0442\u043e\u0438\u0442 \u043e\u0442\u043c\u0435\u0442\u0438\u0442\u044c, \u0447\u0442\u043e Avro \u0441\u0445\u0435\u043c\u044b \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u044e\u0442 \u0441\u043b\u043e\u0436\u043d\u044b\u0435 \u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u044b \u0441\u043e \u0432\u043b\u043e\u0436\u0435\u043d\u043d\u044b\u043c\u0438 \u043a\u043e\u043b\u043e\u043d\u043a\u0430\u043c\u0438.
\u0414\u043b\u044f \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u043f\u043e\u043b\u0443\u0447\u0438\u0442\u044c \u0441\u0445\u0435\u043c\u0443 \u0438\u0437 Hive \u043a\u0430\u0442\u0430\u043b\u043e\u0433\u0430, \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0443\u043a\u0430\u0437\u0430\u0442\u044c \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0435 \u043f\u0430\u0440\u0430\u043c\u0435\u0442\u0440\u044b:
kind: \"hive\"
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. \u0423\u0441\u0442\u0430\u043d\u0430\u0432\u043b\u0438\u0432\u0430\u0435\u0442 Hive \u0444\u043e\u0440\u043c\u0430\u0442 \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u044f \u0441\u0445\u0435\u043c\u044b.id
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. ID \u0441\u0445\u0435\u043c\u044b;description
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0441\u0445\u0435\u043c\u044b;schema
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. Hive \u0441\u0445\u0435\u043c\u0430, \u0432 \u043a\u043e\u0442\u043e\u0440\u043e\u0439 \u043d\u0430\u0445\u043e\u0434\u0438\u0442\u0441\u044f \u0446\u0435\u043b\u0435\u0432\u0430\u044f Hive \u0442\u0430\u0431\u043b\u0438\u0446\u0430.table
- \u041e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u043e. Hive \u0442\u0430\u0431\u043b\u0438\u0446\u0430 \u0438\u0437 \u043a\u043e\u0442\u043e\u0440\u043e\u0439 \u0441\u0447\u0438\u0442\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u0441\u0445\u0435\u043c\u0430.excludeColumns
- \u041e\u043f\u0446\u0438\u043e\u043d\u0430\u043b\u044c\u043d\u043e. \u0421\u043f\u0438\u0441\u043e\u043a \u043a\u043e\u043b\u043e\u043d\u043e\u043a, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0431\u0443\u0434\u0443\u0442 \u0438\u0441\u043a\u043b\u044e\u0447\u0435\u043d\u044b \u0438\u0437 \u0441\u0445\u0435\u043c\u044b. \u041d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u0432 \u043d\u0435\u043a\u043e\u0442\u043e\u0440\u044b\u0445 \u0441\u0438\u0442\u0443\u0430\u0446\u0438\u044f\u0445 \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e \u0438\u0441\u043a\u043b\u044e\u0447\u0438\u0442\u044c \u043a\u043e\u043b\u043e\u043d\u043a\u0438 \u043f\u0430\u0440\u0442\u0438\u0446\u0438\u043e\u043d\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f \u0438\u0437 \u0441\u0445\u0435\u043c\u044b.\u0421\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0435 \u043d\u0430\u0438\u043c\u0435\u043d\u043e\u0432\u0430\u043d\u0438\u044f \u0442\u0438\u043f\u043e\u0432 \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u044b \u043f\u0440\u0438 \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u0438 \u0441\u0445\u0435\u043c \u0434\u0430\u043d\u043d\u044b\u0445:
string
boolean
date
timestamp
integer (32-bit integer)
long (64-bit integer)
short (16-bit integer)
byte (signed integer in a single byte)
double
float
decimal(precision, scale)
(precision <= 38; scale <= precision)\u041a\u0430\u043a \u043f\u043e\u043a\u0430\u0437\u0430\u043d\u043e \u0432 \u043f\u0440\u0438\u043c\u0435\u0440\u0435 \u043d\u0438\u0436\u0435, \u0440\u0430\u0437\u0434\u0435\u043b schema
\u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u044f\u0435\u0442 \u0441\u043e\u0431\u043e\u0439 \u0441\u043f\u0438\u0441\u043e\u043a \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0439 \u0441\u0445\u0435\u043c \u0441 \u0443\u043a\u0430\u0437\u0430\u043d\u0438\u0435\u043c \u0442\u0438\u043f\u043e\u0432 \u044d\u0442\u0438\u0445 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0439.
jobConfig: {\n schemas: [\n {\n id: \"schema1\"\n kind: \"delimited\"\n schema: [\n {name: \"colA\", type: \"string\"},\n {name: \"colB\", type: \"timestamp\"},\n {name: \"colC\", type: \"decimal(10, 3)\"}\n ]\n }\n {\n id: \"schema2\"\n kind: \"fixedFull\",\n schema: [\n {name: \"col1\", type: \"integer\", width: 5},\n {name: \"col2\", type: \"double\", width: 6},\n {name: \"col3\", type: \"boolean\", width: 4}\n ]\n }\n {id: \"schema3\", kind: \"fixedShort\", schema: [\"colOne:5\", \"colTwo:7\", \"colThree:9\"]}\n {id: \"hive_schema\", kind: \"hive\", schema: \"some_schema\", table: \"some_table\"}\n {id: \"avro_schema\", kind: \"avro\", schema: \"path/to/avro_schema.avsc\"}\n ]\n}\n
"},{"location":"ru/03-job-configuration/03-Sources/","title":"Source Configuration","text":"tbd
"},{"location":"ru/03-job-configuration/04-Streams/","title":"Streaming Sources Configurations","text":"tbd
"},{"location":"ru/03-job-configuration/05-VirtualSources/","title":"Virtual Sources Configuration","text":"tbd
"},{"location":"ru/03-job-configuration/06-VirtualStreams/","title":"Virtual Streaming Sources Configuration","text":"tbd
"},{"location":"ru/03-job-configuration/07-LoadChecks/","title":"Load Checks Configuration","text":"tbd
"},{"location":"ru/03-job-configuration/08-Metrics/","title":"Metrics Configuration","text":"tbd
"},{"location":"ru/03-job-configuration/09-Checks/","title":"Checks Configurations","text":"tbd
"},{"location":"ru/03-job-configuration/10-Targets/","title":"Targets Configuration","text":"tbd
"},{"location":"ru/03-job-configuration/11-FileOutputs/","title":"File Output Configuration","text":"tbd
"},{"location":"ru/03-job-configuration/12-JobConfigExample/","title":"Job Configuration Example","text":"tbd
"}]} \ No newline at end of file diff --git a/sitemap.xml b/sitemap.xml index 58d99445..25c58c39 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,392 +2,392 @@