Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error When Running Data Quality Check - 'names' attribute [1] must be the same length as the vector [0] #562

Open
Zachary-Higgins opened this issue Aug 29, 2024 · 6 comments

Comments

@Zachary-Higgins
Copy link

Hi! We're testing this library out on top of an OMOP CDM 5.4 in Databricks (Spark), and I'm running into some issues with it's usage. After running DataQualityDashboard::executeDqChecks, I'm receiving the error below. I would like to save the outputFile json and display it in a Shiny app. The outputFile json doesn't generate after this error occurs and we're left with the error below.

Error in `dplyr::mutate()`:
! Problem while computing `notApplicableReason = ifelse(...)`.
Caused by error in `` names(message) <- `*vtmp*` ``:
! 'names' attribute [1] must be the same length as the vector [0]

Here is the output of the whole run to demonstrate the connection to our DBSQL cluster is successful, and I'm actually able to complete quite a few checks:

Connecting using Spark driver

── Column specification ─────────────────────────────────────────────────────────────
cols(
  checkLevel = col_character(),
  checkName = col_character(),
  checkDescription = col_character(),
  kahnContext = col_character(),
  kahnCategory = col_character(),
  kahnSubcategory = col_character(),
  sqlFile = col_character(),
  evaluationFilter = col_character(),
  severity = col_character()
)

2024-08-29 12:14:42	CDM Tables skipped: CONCEPT, VOCABULARY, CONCEPT_ANCESTOR, CONCEPT_RELATIONSHIP, CONCEPT_CLASS, CONCEPT_SYNONYM, RELATIONSHIP, DOMAIN, VISIT_OCCURRENCE, VISIT_DETAIL
Connecting using Spark driver
2024-08-29 12:14:43	Processing check description: cdmTable
2024-08-29 12:14:56	Processing check description: measurePersonCompleteness
2024-08-29 12:15:03	Processing check description: measureConditionEraCompleteness
2024-08-29 12:15:03	Processing check description: cdmField
2024-08-29 12:18:08	Processing check description: isRequired
2024-08-29 12:19:13	Processing check description: cdmDatatype
2024-08-29 12:21:01	Processing check description: isPrimaryKey
2024-08-29 12:21:10	Processing check description: isForeignKey
2024-08-29 12:22:22	Processing check description: fkDomain
2024-08-29 12:22:43	Processing check description: fkClass
2024-08-29 12:22:44	Processing check description: isStandardValidConcept
2024-08-29 12:23:07	Processing check description: measureValueCompleteness
2024-08-29 12:25:50	Processing check description: standardConceptRecordCompleteness
2024-08-29 12:26:06	Processing check description: sourceConceptRecordCompleteness
2024-08-29 12:26:11	Processing check description: sourceValueCompleteness
2024-08-29 12:26:28	Processing check description: plausibleValueLow
2024-08-29 12:26:58	Processing check description: plausibleValueHigh
2024-08-29 12:27:20	Processing check description: plausibleTemporalAfter
2024-08-29 12:27:43	Processing check description: plausibleDuringLife
2024-08-29 12:27:54	Processing check description: withinVisitDates
2024-08-29 12:27:57	Processing check description: plausibleAfterBirth
2024-08-29 12:28:20	Processing check description: plausibleBeforeDeath
2024-08-29 12:28:36	Processing check description: plausibleStartBeforeEnd
2024-08-29 12:28:48	Processing check description: plausibleGender
2024-08-29 12:31:13	Processing check description: plausibleGenderUseDescendants
2024-08-29 12:31:16	Processing check description: plausibleUnitConceptIds
Error in `dplyr::mutate()`:
! Problem while computing `notApplicableReason = ifelse(...)`.
Caused by error in `` names(message) <- `*vtmp*` ``:
! 'names' attribute [1] must be the same length as the vector [0]
Run `rlang::last_error()` to see where the error occurred.

Can you help shed some light on this error? We think this Dashboard is going to be very helpful but can't seem to get past this problem. Thanks!

@Zachary-Higgins
Copy link
Author

Hi - Some additional info from some testing....

When I remove "Field" from CheckLevels, we get the output json.

checkLevels <- c("TABLE", "FIELD", "CONCEPT")
checkLevels <- c("TABLE", "CONCEPT")

Any advice on zeroing in on this issue?

@katy-sadowski
Copy link
Collaborator

Hi there, can you please share the full script you ran that resulted in the error (i.e., values of parameters passed into executeDqChecks)? And the version of the package you're using? Thanks.

@Zachary-Higgins
Copy link
Author

Zachary-Higgins commented Sep 9, 2024

Hi @katy-sadowski. Sorry about the late reply. Here it is. Thanks for offering to help! Sorry it's a little messy, just copied what I had without modifying it incase there were any other clues in there.

#--------INSTALL FEATURE EXTRACTION PACKAGE
if (!require("drat")) install.packages("drat")
drat::addRepo("OHDSI")
if (!require("remotes")) install.packages("remotes")
remotes::install_github("OHDSI/DataQualityDashboard")
if (!require("DatabaseConnector"))install.packages("DatabaseConnector")

# Download Drivers
Sys.setenv(DATABASECONNECTOR_JAR_FOLDER = "//home//ohdsi")

#-------IMPORT PACKAGES
library("DataQualityDashboard")
library("DatabaseConnector")

#-------CONFIG CONNECTIONS
downloadJdbcDrivers("spark")
connectionDetails <- createConnectionDetails(
  dbms="spark", 
  connectionString="jdbc:spark://...UID=token;UseNativeQuery=1",
  user="token", 
  password="...")
cdmDatabaseSchema = "omop"
resultsDatabaseSchema = "omop"


cdmSourceName <- "OMOP CDM v5.4 Demo Environment"
cdmVersion <- "5.4"
numThreads <- 1
sqlOnly <- FALSE
sqlOnlyIncrementalInsert <- FALSE
sqlOnlyUnionCount <- 1
outputFolder <- "output"
outputFile <- "results.json"
verboseMode <- TRUE
writeToTable <- FALSE
writeTableName <- ""
writeToCsv <- FALSE
csvFile <- "" 
checkLevels <- c("TABLE", "FIELD", "CONCEPT")
allChecks <- DataQualityDashboard::listDqChecks(cdmVersion=cdmVersion)
checkNames <- allChecks$checkDescriptions$checkName
tablesToExclude <- c("CONCEPT", "VOCABULARY", "CONCEPT_ANCESTOR", "CONCEPT_RELATIONSHIP", "CONCEPT_CLASS", "CONCEPT_SYNONYM", "RELATIONSHIP", "DOMAIN", "VISIT_OCCURRENCE", "VISIT_DETAIL")


DataQualityDashboard::executeDqChecks(connectionDetails = connectionDetails,
  cdmDatabaseSchema = cdmDatabaseSchema,
  resultsDatabaseSchema = resultsDatabaseSchema,
  cdmSourceName = cdmSourceName,
  cdmVersion = cdmVersion,
  numThreads = numThreads,
  sqlOnly = sqlOnly,
  sqlOnlyUnionCount = sqlOnlyUnionCount,
  sqlOnlyIncrementalInsert = sqlOnlyIncrementalInsert,
  outputFolder = outputFolder,
  outputFile = outputFile,
  verboseMode = verboseMode,
  writeToTable = writeToTable,
  writeToCsv = writeToCsv,
  csvFile = csvFile,
  checkLevels = checkLevels,
  tablesToExclude = tablesToExclude,
  checkNames = checkNames)


DataQualityDashboard::viewDqDashboard("~/output/results.json")

@katy-sadowski
Copy link
Collaborator

Thanks for the code - can you please confirm what version of the package you're running?

@Zachary-Higgins
Copy link
Author

Hi @katy-sadowski, oops sorry.
DataQualityDashboard - 2.6.1.
DatabaseConnector - 5.0.4
SqlRender - 1.18.1

@katy-sadowski
Copy link
Collaborator

Thanks! Can you try running again with checkNames <- c()? This will run all checks. I'm wondering if it's due to some issue with listDqChecks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants