diff --git a/_pkgdown.yml b/_pkgdown.yml index 284043fa..e1dc440b 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -67,7 +67,7 @@ navbar: - text: fkDomain href: articles/checks/fkDomain.html - text: fkClass - href: articles/fkClass.html + href: articles/checks/fkClass.html - text: plausibleAfterBirth href: articles/checks/plausibleAfterBirth.html hades: diff --git a/docs/404.html b/docs/404.html index 17478260..5a128d2c 100644 --- a/docs/404.html +++ b/docs/404.html @@ -88,28 +88,31 @@ Index
Apache License - Version 2.0, January 2004 - http://www.apache.org/licenses/ - - TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION - - 1. Definitions. - - "License" shall mean the terms and conditions for use, reproduction, - and distribution as defined by Sections 1 through 9 of this document. - - "Licensor" shall mean the copyright owner or entity authorized by - the copyright owner that is granting the License. - - "Legal Entity" shall mean the union of the acting entity and all - other entities that control, are controlled by, or are under common - control with that entity. For the purposes of this definition, - "control" means (i) the power, direct or indirect, to cause the - direction or management of such entity, whether by contract or - otherwise, or (ii) ownership of fifty percent (50%) or more of the - outstanding shares, or (iii) beneficial ownership of such entity. - - "You" (or "Your") shall mean an individual or Legal Entity - exercising permissions granted by this License. - - "Source" form shall mean the preferred form for making modifications, - including but not limited to software source code, documentation - source, and configuration files. - - "Object" form shall mean any form resulting from mechanical - transformation or translation of a Source form, including but - not limited to compiled object code, generated documentation, - and conversions to other media types. - - "Work" shall mean the work of authorship, whether in Source or - Object form, made available under the License, as indicated by a - copyright notice that is included in or attached to the work - (an example is provided in the Appendix below). - - "Derivative Works" shall mean any work, whether in Source or Object - form, that is based on (or derived from) the Work and for which the - editorial revisions, annotations, elaborations, or other modifications - represent, as a whole, an original work of authorship. For the purposes - of this License, Derivative Works shall not include works that remain - separable from, or merely link (or bind by name) to the interfaces of, - the Work and Derivative Works thereof. - - "Contribution" shall mean any work of authorship, including - the original version of the Work and any modifications or additions - to that Work or Derivative Works thereof, that is intentionally - submitted to Licensor for inclusion in the Work by the copyright owner - or by an individual or Legal Entity authorized to submit on behalf of - the copyright owner. For the purposes of this definition, "submitted" - means any form of electronic, verbal, or written communication sent - to the Licensor or its representatives, including but not limited to - communication on electronic mailing lists, source code control systems, - and issue tracking systems that are managed by, or on behalf of, the - Licensor for the purpose of discussing and improving the Work, but - excluding communication that is conspicuously marked or otherwise - designated in writing by the copyright owner as "Not a Contribution." - - "Contributor" shall mean Licensor and any individual or Legal Entity - on behalf of whom a Contribution has been received by Licensor and - subsequently incorporated within the Work. - - 2. Grant of Copyright License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - copyright license to reproduce, prepare Derivative Works of, - publicly display, publicly perform, sublicense, and distribute the - Work and such Derivative Works in Source or Object form. - - 3. Grant of Patent License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - (except as stated in this section) patent license to make, have made, - use, offer to sell, sell, import, and otherwise transfer the Work, - where such license applies only to those patent claims licensable - by such Contributor that are necessarily infringed by their - Contribution(s) alone or by combination of their Contribution(s) - with the Work to which such Contribution(s) was submitted. If You - institute patent litigation against any entity (including a - cross-claim or counterclaim in a lawsuit) alleging that the Work - or a Contribution incorporated within the Work constitutes direct - or contributory patent infringement, then any patent licenses - granted to You under this License for that Work shall terminate - as of the date such litigation is filed. - - 4. Redistribution. You may reproduce and distribute copies of the - Work or Derivative Works thereof in any medium, with or without - modifications, and in Source or Object form, provided that You - meet the following conditions: - - (a) You must give any other recipients of the Work or - Derivative Works a copy of this License; and - - (b) You must cause any modified files to carry prominent notices - stating that You changed the files; and - - (c) You must retain, in the Source form of any Derivative Works - that You distribute, all copyright, patent, trademark, and - attribution notices from the Source form of the Work, - excluding those notices that do not pertain to any part of - the Derivative Works; and - - (d) If the Work includes a "NOTICE" text file as part of its - distribution, then any Derivative Works that You distribute must - include a readable copy of the attribution notices contained - within such NOTICE file, excluding those notices that do not - pertain to any part of the Derivative Works, in at least one - of the following places: within a NOTICE text file distributed - as part of the Derivative Works; within the Source form or - documentation, if provided along with the Derivative Works; or, - within a display generated by the Derivative Works, if and - wherever such third-party notices normally appear. The contents - of the NOTICE file are for informational purposes only and - do not modify the License. You may add Your own attribution - notices within Derivative Works that You distribute, alongside - or as an addendum to the NOTICE text from the Work, provided - that such additional attribution notices cannot be construed - as modifying the License. - - You may add Your own copyright statement to Your modifications and - may provide additional or different license terms and conditions - for use, reproduction, or distribution of Your modifications, or - for any such Derivative Works as a whole, provided Your use, - reproduction, and distribution of the Work otherwise complies with - the conditions stated in this License. - - 5. Submission of Contributions. Unless You explicitly state otherwise, - any Contribution intentionally submitted for inclusion in the Work - by You to the Licensor shall be under the terms and conditions of - this License, without any additional terms or conditions. - Notwithstanding the above, nothing herein shall supersede or modify - the terms of any separate license agreement you may have executed - with Licensor regarding such Contributions. - - 6. Trademarks. This License does not grant permission to use the trade - names, trademarks, service marks, or product names of the Licensor, - except as required for reasonable and customary use in describing the - origin of the Work and reproducing the content of the NOTICE file. - - 7. Disclaimer of Warranty. Unless required by applicable law or - agreed to in writing, Licensor provides the Work (and each - Contributor provides its Contributions) on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - implied, including, without limitation, any warranties or conditions - of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A - PARTICULAR PURPOSE. You are solely responsible for determining the - appropriateness of using or redistributing the Work and assume any - risks associated with Your exercise of permissions under this License. - - 8. Limitation of Liability. In no event and under no legal theory, - whether in tort (including negligence), contract, or otherwise, - unless required by applicable law (such as deliberate and grossly - negligent acts) or agreed to in writing, shall any Contributor be - liable to You for damages, including any direct, indirect, special, - incidental, or consequential damages of any character arising as a - result of this License or out of the use or inability to use the - Work (including but not limited to damages for loss of goodwill, - work stoppage, computer failure or malfunction, or any and all - other commercial damages or losses), even if such Contributor - has been advised of the possibility of such damages. - - 9. Accepting Warranty or Additional Liability. While redistributing - the Work or Derivative Works thereof, You may choose to offer, - and charge a fee for, acceptance of support, warranty, indemnity, - or other liability obligations and/or rights consistent with this - License. However, in accepting such obligations, You may act only - on Your own behalf and on Your sole responsibility, not on behalf - of any other Contributor, and only if You agree to indemnify, - defend, and hold each Contributor harmless for any liability - incurred by, or claims asserted against, such Contributor by reason - of your accepting any such warranty or additional liability. - - END OF TERMS AND CONDITIONS - - APPENDIX: How to apply the Apache License to your work. - - To apply the Apache License to your work, attach the following - boilerplate notice, with the fields enclosed by brackets "[]" - replaced with your own identifying information. (Don't include - the brackets!) The text should be enclosed in the appropriate - comment syntax for the file format. We also recommend that a - file or class name and description of purpose be included on the - same "printed page" as the copyright notice for easier - identification within third-party archives. - - Copyright [yyyy] [name of copyright owner] - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. -- -
The goal of the Data Quality Dashboard (DQD) project is to design and develop an open-source tool to expose and evaluate observational data quality.
-This package will run a series of data quality checks against an OMOP CDM instance (currently supports v5.3.1 and v5.2.2). It systematically runs the checks, evaluates the checks against some pre-specified threshold, and then communicates what was done in a transparent and easily understandable way.
-The quality checks were organized according to the Kahn Framework1 which uses a system of categories and contexts that represent stratgies for assessing data quality. ##TODO for an introduction to the kahn framework please click [here] link.
-Using this framework, the Data Quality Dashboard takes a systematic-based approach to running data quality checks. Instead of writing thousands of individual checks, we use “data quality check types”. These “check types” are more general, parameterized data quality checks into which OMOP tables, fields, and concepts can be substituted to represent a singular data quality idea. For example, one check type might be written as
---The number and percent of records with a value in the cdmFieldName field of the cdmTableName table less than plausibleValueLow.
-
This would be considered an atemporal plausibility verification check because we are looking for implausibly low values in some field based on internal knowledge. We can use this check type to substitute in values for cdmFieldName, cdmTableName, and plausibleValueLow to create a unique data quality check. If we apply it to PERSON.YEAR_OF_BIRTH here is how that might look:
---The number and percent of records with a value in the year_of_birth field of the PERSON table less than 1850.
-
And, since it is parameterized, we can similarly apply it to DRUG_EXPOSURE.days_supply:
---The number and percent of records with a value in the days_supply field of the DRUG_EXPOSURE table less than 0.
-
Version 1 of the tool includes 20 different check types organized into Kahn contexts and categories. Additionally, each data quality check type is considered either a table check, field check, or concept-level check. Table-level checks are those evaluating the table at a high-level without reference to individual fields, or those that span multiple event tables. These include checks making sure required tables are present or that at least some of the people in the PERSON table have records in the event tables. Field-level checks are those related to specific fields in a table. The majority of the check types in version 1 are field-level checks. These include checks evaluating primary key relationship and those investigating if the concepts in a field conform to the specified domain. Concept-level checks are related to individual concepts. These include checks looking for gender-specific concepts in persons of the wrong gender and plausible values for measurement-unit pairs. ##TODO The below table lists all check types, the check level (table, field, concept), a description of the check, and Kahn category and context it fits into. Remove table, click here for more information on the checks included
-After systematically applying the 20 check types to an OMOP CDM version approximately 3,351 individual data quality checks are resolved, run against the database, and evaluated based on a pre-specified threshold ##TODO [more about thresholds here]. The R package then creates a json object that is read into an RShiny application to view the results.
- -Requires R (version 3.2.2 or higher). Requires DatabaseConnector and SqlRender.
-install.packages("devtools") -devtools::install_github("OHDSI/DataQualityDashboard")
-# fill out the connection details ----------------------------------------------------------------------- -connectionDetails <- DatabaseConnector::createConnectionDetails(dbms = "", - user = "", - password = "", - server = "", - port = "", - extraSettings = "") - -cdmDatabaseSchema <- "yourCdmSchema" # the fully qualified database schema name of the CDM -resultsDatabaseSchema <- "yourResultsSchema" # the fully qualified database schema name of the results schema (that you can write to) -cdmSourceName <- "Your CDM Source" # a human readable name for your CDM source - -# determine how many threads (concurrent SQL sessions) to use ---------------------------------------- -numThreads <- 1 # on Redshift, 3 seems to work well - -# specify if you want to execute the queries or inspect them ------------------------------------------ -sqlOnly <- FALSE # set to TRUE if you just want to get the SQL scripts and not actually run the queries - -# where should the logs go? ------------------------------------------------------------------------- -outputFolder <- "output" - -# logging type ------------------------------------------------------------------------------------- -verboseMode <- FALSE # set to TRUE if you want to see activity written to the console - -# write results to table? ------------------------------------------------------------------------------ -writeToTable <- TRUE # set to FALSE if you want to skip writing to a SQL table in the results schema - -# if writing to table and using Redshift, bulk loading can be initialized ------------------------------- - -# Sys.setenv("AWS_ACCESS_KEY_ID" = "", -# "AWS_SECRET_ACCESS_KEY" = "", -# "AWS_DEFAULT_REGION" = "", -# "AWS_BUCKET_NAME" = "", -# "AWS_OBJECT_KEY" = "", -# "AWS_SSE_TYPE" = "AES256", -# "USE_MPP_BULK_LOAD" = TRUE) - -# which DQ check levels to run ------------------------------------------------------------------- -checkLevels <- c("TABLE", "FIELD", "CONCEPT") - -# which DQ checks to run? ------------------------------------ - -checkNames <- c() # Names can be found in inst/csv/OMOP_CDM_v5.3.1_Check_Desciptions.csv - -# run the job -------------------------------------------------------------------------------------- -DataQualityDashboard::executeDqChecks(connectionDetails = connectionDetails, - cdmDatabaseSchema = cdmDatabaseSchema, - resultsDatabaseSchema = resultsDatabaseSchema, - cdmSourceName = cdmSourceName, - numThreads = numThreads, - sqlOnly = sqlOnly, - outputFolder = outputFolder, - verboseMode = verboseMode, - writeToTable = writeToTable, - checkLevels = checkLevels, - checkNames = checkNames) - -# inspect logs ---------------------------------------------------------------------------- -ParallelLogger::launchLogViewer(logFileName = file.path(outputFolder, cdmSourceName, - sprintf("log_DqDashboard_%s.txt", cdmSourceName))) - -# (OPTIONAL) if you want to write the JSON file to the results table separately ----------------------------- -jsonFilePath <- "" -DataQualityDashboard::writeJsonResultsToTable(connectionDetails = connectionDetails, - resultsDatabaseSchema = resultsDatabaseSchema, - jsonFilePath = jsonFilePath)
Launching Dashboard as Shiny App
-DataQualityDashboard::viewDqDashboard(jsonPath = file.path(getwd(), outputFolder, cdmSourceName, sprintf("results_%s.json", cdmSourceName)))
Launching on a web server
-If you have npm installed:
-npm install -g http-server
-Rename the json file to results.json and place it in inst/shinyApps/www
Go to inst/shinyApps/www, then run:
http-server
-A results JSON file for the Synthea synthetic dataset will be shown. You can view your results by replacing the results.json file with your file (with name results.json).
-To see description of checks using R, execute the command bellow:
-View(read.csv(system.file("csv","OMOP_CDMv5.3.1_Check_Descriptions.csv",package="DataQualityDashboard"),as.is=T))
-This project is supported in part through the National Science Foundation grant IIS 1251151.
-1 Kahn, M.G., et al., A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data. EGEMS (Wash DC), 2016. 4(1): p. 1244. ↩︎
-Execute DQ checks
- -executeDqChecks( - connectionDetails, - cdmDatabaseSchema, - resultsDatabaseSchema, - cdmSourceName, - numThreads = 1, - sqlOnly = FALSE, - outputFolder = "output", - verboseMode = FALSE, - writeToTable = TRUE, - checkLevels = c("TABLE", "FIELD", "CONCEPT"), - checkNames = c(), - tablesToExclude = c(), - cdmVersion = "5.3.1" -)- -
connectionDetails | -A connectionDetails object for connecting to the CDM database |
-
---|---|
cdmDatabaseSchema | -The fully qualified database name of the CDM schema |
-
resultsDatabaseSchema | -The fully qualified database name of the results schema |
-
cdmSourceName | -The name of the CDM data source |
-
numThreads | -The number of concurrent threads to use to execute the queries |
-
sqlOnly | -Should the SQLs be executed (FALSE) or just returned (TRUE)? |
-
outputFolder | -The folder to output logs and SQL files to |
-
verboseMode | -Boolean to determine if the console will show all execution steps. Default = FALSE |
-
writeToTable | -Boolean to indicate if the check results will be written to the dqdashboard_results table -in the resultsDatabaseSchema. Default is TRUE. |
-
checkLevels | -Choose which DQ check levels to execute. Default is all 3 (TABLE, FIELD, CONCEPT) |
-
checkNames | -(OPTIONAL) Choose which check names to execute. Names can be found in inst/csv/OMOP_CDM_v[cdmVersion]_Check_Desciptions.csv |
-
tablesToExclude | -(OPTIONAL) Choose which CDM tables to exclude from the execution. |
-
cdmVersion | -The CDM version to target for the data source. By default, 5.3.1 is used. |
-
If sqlOnly = FALSE, a list object of results
- - -View DQ Dashboard
- -viewDqDashboard(jsonPath)- -
jsonPath | -The path to the JSON file produced by |
-
---|
Write JSON Results to SQL Table
- -writeJsonResultsToTable(connectionDetails, resultsDatabaseSchema, jsonFilePath)- -
connectionDetails | -A connectionDetails object for connecting to the CDM database |
-
---|---|
resultsDatabaseSchema | -The fully qualified database name of the results schema |
-
jsonFilePath | -Path to the JSON results file generated using the execute function |
-
vignettes/AddNewCheck.rmd
AddNewCheck.rmd
vignettes/CheckStatusDefinitions.rmd
CheckStatusDefinitions.rmd
vignettes/CheckTypeDescriptions.rmd
CheckTypeDescriptions.rmd
vignettes/DataQualityDashboard.rmd
DataQualityDashboard.rmd
vignettes/DqdForCohorts.rmd
DqdForCohorts.rmd
vignettes/GettingStarted.rmd
- GettingStarted.rmd
install.packages("devtools") -devtools::install_github("OHDSI/DataQualityDashboard")
-# fill out the connection details ----------------------------------------------------------------------- -connectionDetails <- DatabaseConnector::createConnectionDetails(dbms = "", - user = "", - password = "", - server = "", - port = "", - extraSettings = "") - -cdmDatabaseSchema <- "yourCdmSchema" # the fully qualified database schema name of the CDM -resultsDatabaseSchema <- "yourResultsSchema" # the fully qualified database schema name of the results schema (that you can write to) -cdmSourceName <- "Your CDM Source" # a human readable name for your CDM source - -# determine how many threads (concurrent SQL sessions) to use ---------------------------------------- -numThreads <- 1 # on Redshift, 3 seems to work well - -# specify if you want to execute the queries or inspect them ------------------------------------------ -sqlOnly <- FALSE # set to TRUE if you just want to get the SQL scripts and not actually run the queries - -# where should the logs go? ------------------------------------------------------------------------- -outputFolder <- "output" - -# logging type ------------------------------------------------------------------------------------- -verboseMode <- FALSE # set to TRUE if you want to see activity written to the console - -# write results to table? ------------------------------------------------------------------------------ -writeToTable <- TRUE # set to FALSE if you want to skip writing to a SQL table in the results schema - -# if writing to table and using Redshift, bulk loading can be initialized ------------------------------- - -# Sys.setenv("AWS_ACCESS_KEY_ID" = "", -# "AWS_SECRET_ACCESS_KEY" = "", -# "AWS_DEFAULT_REGION" = "", -# "AWS_BUCKET_NAME" = "", -# "AWS_OBJECT_KEY" = "", -# "AWS_SSE_TYPE" = "AES256", -# "USE_MPP_BULK_LOAD" = TRUE) - -# which DQ check levels to run ------------------------------------------------------------------- -checkLevels <- c("TABLE", "FIELD", "CONCEPT") - -# which DQ checks to run? ------------------------------------ - -checkNames <- c() # Names can be found in inst/csv/OMOP_CDM_v5.3.1_Check_Desciptions.csv - -# which CDM tables to exclude? ------------------------------------ - -tablesToExclude <- c() - -# run the job -------------------------------------------------------------------------------------- -DataQualityDashboard::executeDqChecks(connectionDetails = connectionDetails, - cdmDatabaseSchema = cdmDatabaseSchema, - resultsDatabaseSchema = resultsDatabaseSchema, - cdmSourceName = cdmSourceName, - numThreads = numThreads, - sqlOnly = sqlOnly, - outputFolder = outputFolder, - verboseMode = verboseMode, - writeToTable = writeToTable, - checkLevels = checkLevels, - tablesToExclude = tablesToExclude, - checkNames = checkNames) - -# inspect logs ---------------------------------------------------------------------------- -ParallelLogger::launchLogViewer(logFileName = file.path(outputFolder, cdmSourceName, - sprintf("log_DqDashboard_%s.txt", cdmSourceName))) - -# (OPTIONAL) if you want to write the JSON file to the results table separately ----------------------------- -jsonFilePath <- "" -DataQualityDashboard::writeJsonResultsToTable(connectionDetails = connectionDetails, - resultsDatabaseSchema = resultsDatabaseSchema, - jsonFilePath = jsonFilePath)
Launching Dashboard as Shiny App
-DataQualityDashboard::viewDqDashboard(jsonPath = file.path(getwd(), outputFolder, cdmSourceName, sprintf("results_%s.json", cdmSourceName)))
Launching on a web server
-If you have npm installed:
-npm install -g http-server
-Rename the json file to results.json and place it in inst/shinyApps/www
Go to inst/shinyApps/www, then run:
http-server
-A results JSON file for the Synthea synthetic dataset will be shown. You can view your results by replacing the results.json file with your file (with name results.json).
-To see description of checks using R, execute the command below:
-View(read.csv(system.file("csv","OMOP_CDMv5.3.1_Check_Descriptions.csv",package="DataQualityDashboard"),as.is=T))
-vignettes/SqlOnly.rmd
SqlOnly.rmd
vignettes/Thresholds.rmd
Thresholds.rmd
vignettes/checkIndex.Rmd
checkIndex.Rmd
This section contains detailed descriptions of the data quality checks included in the DataQualityDashboard package. Each check is described on its own page; click on the check name in the list below or @@ -164,10 +164,12 @@
N.B. This section is currently under development. A documentation page is not yet available for all checks. The links below will be -updated as more pages are added. In the meantime, see the Check +updated as more pages are added. In the meantime, see the Check Type Descriptions page for a brief description of each check.
-General guidance:
+/*violatedRowsBegin*/
and
/*violatedRowsEnd*/
) from the SQL query displayed in the
DQD results viewer for a given check to inspect rows that failed the
checkChecks:
+vignettes/checks/isStandardValidConcept.Rmd
+ isStandardValidConcept.Rmd
The number and percent of records that do not have a standard, valid +concept in the @cdmFieldName field in the +@cdmTableName table.
+vignettes/checks/measureConditionEraCompleteness.Rmd
+ measureConditionEraCompleteness.Rmd
The number and Percent of persons that does not have condition_era +built successfully, for all persons in condition_occurrence
+vignettes/checks/measurePersonCompleteness.Rmd
+ measurePersonCompleteness.Rmd
The number and percent of persons in the CDM that do not have at +least one record in the @cdmTableName +table
+vignettes/checks/measureValueCompleteness.Rmd
+ measureValueCompleteness.Rmd
The number and percent of records with a NULL value in the @cdmFieldName of the @cdmTableName.
+vignettes/checks/plausibleAfterBirth copy.Rmd
+ plausibleAfterBirth copy.Rmd
The number and percent of records with a date value in the @cdmFieldName field of the @cdmTableName table that occurs prior to +birth.
+vignettes/checks/plausibleAfterBirth.Rmd
+ plausibleAfterBirth.Rmd
Level: Field check
Context: Verification
Category: Plausibility
Subcategory: Temporal
Severity: Characteristic
The number and percent of records with a date value in the +cdmFieldName field of the cdmTableName +table that occurs prior to birth.
+This check verifies that events happen after birth. This check is
+only run on fields where the PLAUSIBLE_AFTER_BIRTH
+parameter is set to Yes. The birthdate is taken from
+the person table
, either the birth_datetime
or
+composed from year_of_birth
, month_of_birth
,
+day_of_birth
(taking 1st month/1st day if missing).
There might be valid reasons why a record has a date value that +occurs prior to birth. For example, prenatal observations might be +captured or procedures on the mother might be added to the file of the +child. Therefore, some failing records are expected and the default +threshold of 1% accounts for that.
+However, if more records violate this check, there might be an issue +with incorrect birthdates or events with a default date. It is +recommended to investigate the records that fail this check to determine +the cause of the error.
+You may also use the “violated rows” SQL query to inspect the +violating rows and help diagnose the potential root cause of the +issue:
+SELECT p.birth_datetime, cdmTable.*
+FROM @cdmDatabaseSchema.@cdmTableName cdmTable
+JOIN @cdmDatabaseSchema.person p ON cdmTable.person_id = p.person_id
+WHERE cdmTable.@cdmFieldName < p.birth_datetime,
or, when birth_datetime is missing:
+SELECT p.birth_datetime, cdmTable.*
+FROM @cdmDatabaseSchema.@cdmTableName cdmTable
+JOIN @cdmDatabaseSchema.person p ON cdmTable.person_id = p.person_id
+WHERE cdmTable.@cdmFieldName < CAST(CONCAT(
+ p.year_of_birth,
+ COALESCE(
+ RIGHT('0' + CAST(p.month_of_birth AS VARCHAR), 2),
+ '01'
+ ),
+ COALESCE(
+ RIGHT('0' + CAST(p.day_of_birth AS VARCHAR), 2),
+ '01'
+ )
+ ) AS DATE)
vignettes/checks/plausibleBeforeDeath.Rmd
+ plausibleBeforeDeath.Rmd
The number and percent of records with a date value in the @cdmFieldName field of the @cdmTableName table that occurs after death.
+vignettes/checks/plausibleDuringLife.Rmd
+ plausibleDuringLife.Rmd
If yes, the number and percent of records with a date value in the +@cdmFieldName field of the @cdmTableName table that occurs after death.
+vignettes/checks/plausibleGender.Rmd
+ plausibleGender.Rmd
Level: CONCEPT
Context: Validation
Category: Plausibility
Subcategory: Atemporal
Severity:
For a CONCEPT_ID @conceptId (@conceptName), the number and percent of records +associated with patients with an implausible gender (correct gender = +@plausibleGender).
+vignettes/checks/plausibleStartBeforeEnd.Rmd
+ plausibleStartBeforeEnd.Rmd
The number and percent of records with a value in the @cdmFieldName field of the @cdmTableName that occurs after the date in the +@plausibleStartBeforeEndFieldName.
+vignettes/checks/plausibleTemporalAfter.Rmd
+ plausibleTemporalAfter.Rmd
The number and percent of records with a value in the @cdmFieldName field of the @cdmTableName that occurs prior to the date in +the @plausibleTemporalAfterFieldName field +of the @plausibleTemporalAfterTableName +table.
+vignettes/checks/plausibleUnitConceptIds.Rmd
+ plausibleUnitConceptIds.Rmd
Level: CONCEPT
Context: Verification
Category: Plausibility
Subcategory: Atemporal
Severity:
The number and percent of records for a given CONCEPT_ID @conceptId (@conceptName) with implausible units (i.e., +UNIT_CONCEPT_ID NOT IN (@plausibleUnitConceptIds)).
+vignettes/checks/plausibleValueHigh.Rmd
+ plausibleValueHigh.Rmd
Level: FIELD
Context: Verification
Category: Plausibility
Subcategory: Atemporal
Severity:
The number and percent of records with a value in the @cdmFieldName field of the @cdmTableName table greater than @plausibleValueHigh.
+vignettes/checks/plausibleValueLow.Rmd
+ plausibleValueLow.Rmd
Level: FIELD
Context: Verification
Category: Plausibility
Subcategory: Atemporal
Severity:
The number and percent of records with a value in the @cdmFieldName field of the @cdmTableName table less than @plausibleValueLow.
+vignettes/checks/sourceConceptRecordCompleteness.Rmd
+ sourceConceptRecordCompleteness.Rmd
The number and percent of records with a value of 0 in the source +concept field @cdmFieldName in the @cdmTableName table.
+vignettes/checks/sourceValueCompleteness.Rmd
+ sourceValueCompleteness.Rmd
The number and percent of distinct source values in the @cdmFieldName field of the @cdmTableName table mapped to 0.
+vignettes/checks/standardConceptRecordCompleteness.Rmd
+ standardConceptRecordCompleteness.Rmd
The number and percent of records with a value of 0 in the standard +concept field @cdmFieldName in the @cdmTableName table.
+vignettes/checks/withinVisitDates.Rmd
+ withinVisitDates.Rmd
The number and percent of records not within one week on either side +of the corresponding visit occurrence start and end date
+In the DataQualityDashboard v2, new check statuses were introduced: Error
and Not Applicable
. These were introduced to more accurately reflect the quality of data contained in a CDM instance, addressing scenarios where pass/fail is not appropriate. The new set of mutually exclusive status states are listed below in priority order:
Is Error: if a SQL error occurred during execution
Not Applicable: if DQ check is not applicable for reasons explained in the section below
Failed: — if percent violating rows is greater than the threshold
Passed: — if percent violating rows is smaller than the threshold
The results of a DQ check may not be applicable to a given CDM instance depending on the implementation and content of the instance. For example, the DQ check for plausible values of HbA1c lab results would pass with no violations even if there were no results for that lab test in the database. It is not uncommon to have > 1000 DQ checks that do not apply to a given CDM instance. The results from DQ checks that are not applicable skew to overall results. Listed below are the scenarios for which a DQ check result is flagged as Not_applicable:
-If the cdmTable DQ check determines that a table does not exist in the database, then all DQ checks (except cdm_table) addressing that table are flagged as Not_applicable.
If a table exists but is empty, then all field level and concept level checks for that table are flagged as Not_applicable, except for cdmField checks, which evaluates if the field is defined or not. A cdmField check is marked as not_applicable if the CDM table it refers to does not exist (tested by cdmTable). An empty table is detected when the measureValueCompleteness DQ check for any of the fields in the table returns a denominator count = 0 (NUM_DENOMINATOR_ROWS=0).
If a field is not populated, then all field level and concept level checks except for measureValueCompleteness and isRequired are flagged as Not_applicable.
-A field is not populated if the measureValueCompleteness DQ check finds denominator count > 0 and number of violated rows = denominator count (NUM_DENOMINATOR_ROWS > 0 AND NUM_DENOMINATOR_ROWS = NUM_VIOLATED_ROWS).
The measureValueCompleteness check is marked as not applicable if:
-The CDM table it refers to does not exist or is empty.
The CDM field it refers to does not exist.
The isRequired check is marked as not applicable if:
-The CDM table it refers to does not exist or is empty.
The CDM field it refers to does not exist.
Flagging a Concept_ID level DQ check as Not_applicable depends on whether the DQ check logic includes a UNIT_CONCEPT_ID. There are two scenarios for DQ checks evaluating specific Concept_ids.
-The DQ check does not include a UNIT_CONCEPT_ID (value is null). A DQ check is flagged as Not_applicable if there are no instances of the Concept_ID in the table/field. E.g. plausibility checks for specific conditions and gender. Both pregnancy and male do not have UNIT_CONCEPT_IDs.
The DQ check includes a UNIT_CONCEPT_ID. A DQ check is flagged as Not_applicable if there are no instances of both concept and unit concept IDs in the table/field. E.g. all DQ checks referencing the concept_ID for HbA1c lab results expressed in mg/dl units will be flagged as Not_applicable if there are no instances of that concept_ID in the table/field addressed by the DQ check.