-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use external concept counts #1136
Closed
Closed
Changes from 10 commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
3328c0e
Changes in place for externalConceptCounts
cebarboza a52a1e9
parameter conceptCountsTable
cebarboza 246014a
updating getDbms
cebarboza 2b27ed8
missing variable in run diagnostics
cebarboza 6b0d044
removing stop external concept count table
cebarboza 32a6eaf
applied requested changes for PR
cebarboza e1dec96
Merge branch 'develop' into useExternalConceptCounts
ablack3 abae4f1
update documentation. Only download gibleed eunomia dataset if it doe…
ablack3 1c13de0
update sqlite test external counts table
cebarboza bf925bd
revert change in databaseFile
ablack3 27a8adc
check conceptCountsTable #
cebarboza 7362864
creating externalConceptCounts file
cebarboza 864af81
first approach
cebarboza 339dbd0
second approach temp table created when # in conceptCountsTable
cebarboza ba45d63
deleted assigning #concept_counts as default for temp table
cebarboza 8f9c35b
style arguments
cebarboza File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
# Copyright 2022 Observational Health Data Sciences and Informatics | ||
# | ||
# This file is part of CohortDiagnostics | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
#' createConceptCountsTable | ||
#' | ||
#' @description Create a table containing concept counts. | ||
#' CohortDiagnostics performs this task in every run and takes a significant amount of time. | ||
#' However, with this function, the user can create this table beforehand and | ||
#' save it in the writing schema for further use. | ||
#' | ||
#' @inheritParams executeDiagnostics | ||
#' @param conceptCountsDatabaseSchema schema name for the concept counts table | ||
#' @param conceptCountsTableIsTemp boolean to indicate if it should be a temporary table | ||
#' @param removeCurrentTable if the current table should be removed | ||
#' | ||
#' @export | ||
createConceptCountsTable <- function(connectionDetails = NULL, | ||
connection = NULL, | ||
cdmDatabaseSchema, | ||
tempEmulationSchema = NULL, | ||
conceptCountsTable = "concept_counts", | ||
conceptCountsDatabaseSchema = cdmDatabaseSchema, | ||
conceptCountsTableIsTemp = FALSE, | ||
removeCurrentTable = TRUE) { | ||
ParallelLogger::logInfo("Creating concept counts table") | ||
if (is.null(connection)) { | ||
connection <- DatabaseConnector::connect(connectionDetails) | ||
on.exit(DatabaseConnector::disconnect(connection)) | ||
} | ||
sql <- | ||
SqlRender::loadRenderTranslateSql( | ||
"CreateConceptCountTable.sql", | ||
packageName = "CohortDiagnostics", | ||
dbms = connection@dbms, | ||
tempEmulationSchema = tempEmulationSchema, | ||
cdm_database_schema = cdmDatabaseSchema, | ||
work_database_schema = conceptCountsDatabaseSchema, | ||
concept_counts_table = conceptCountsTable, | ||
table_is_temp = conceptCountsTableIsTemp, | ||
remove_current_table = removeCurrentTable | ||
) | ||
executeSql(connection, sql) | ||
} | ||
|
||
#' getConceptCountsTableName | ||
#' | ||
#' @description Get a concept counts table name that is unique for the current database version. | ||
#' We need to make sure the table is only used if the counts are for the current database. | ||
#' | ||
#' @param connection database connection | ||
#' @param cdmDatabaseSchema CDM schema | ||
#' | ||
#' @return the concepts count table name | ||
#' @export | ||
getConceptCountsTableName <- function(connection, cdmDatabaseSchema) { | ||
result <- "concept_counts" | ||
sql <- paste("SELECT vocabulary_version as version", | ||
"FROM @cdmDatabaseSchema.VOCABULARY", | ||
"WHERE vocabulary_id = 'None'") | ||
dbVersion <- DatabaseConnector::renderTranslateQuerySql(connection = connection, | ||
sql = sql, | ||
cdmDatabaseSchema = cdmDatabaseSchema) |> | ||
dplyr::pull(1) | ||
if (!identical(dbVersion, character(0))) { | ||
result <- paste(gsub(" |\\.|-", "_", dbVersion), result, sep = "_") | ||
} | ||
return(result) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -110,6 +110,8 @@ getDefaultCovariateSettings <- function() { | |
#' diagnostics to. | ||
#' @param cohortDefinitionSet Data.frame of cohorts must include columns cohortId, cohortName, json, sql | ||
#' @param cohortTableNames Cohort Table names used by CohortGenerator package | ||
#' @param conceptCountsTable Concepts count table name. The default is "#concept_counts" to create a temporal concept counts table. | ||
#' If an external concept counts table is used, provide the name in character, e.g. "concept_counts" without a hash | ||
#' @param databaseId A short string for identifying the database (e.g. 'Synpuf'). | ||
#' @param databaseName The full name of the database. If NULL, defaults to value in cdm_source table | ||
#' @param databaseDescription A short description (several sentences) of the database. If NULL, defaults to value in cdm_source table | ||
|
@@ -136,6 +138,7 @@ getDefaultCovariateSettings <- function() { | |
#' @param incremental Create only cohort diagnostics that haven't been created before? | ||
#' @param incrementalFolder If \code{incremental = TRUE}, specify a folder where records are kept | ||
#' of which cohort diagnostics has been executed. | ||
#' @param useExternalConceptCountsTable If TRUE an external table for the cohort concept counts will be used. | ||
#' @param runFeatureExtractionOnSample Logical. If TRUE, the function will operate on a sample of the data. | ||
#' Default is FALSE, meaning the function will operate on the full data set. | ||
#' | ||
|
@@ -205,6 +208,7 @@ executeDiagnostics <- function(cohortDefinitionSet, | |
tempEmulationSchema = getOption("sqlRenderTempEmulationSchema"), | ||
cohortTable = "cohort", | ||
cohortTableNames = CohortGenerator::getCohortTableNames(cohortTable = cohortTable), | ||
conceptCountsTable = "#concept_counts", | ||
vocabularyDatabaseSchema = cdmDatabaseSchema, | ||
cohortIds = NULL, | ||
cdmVersion = 5, | ||
|
@@ -223,6 +227,7 @@ executeDiagnostics <- function(cohortDefinitionSet, | |
irWashoutPeriod = 0, | ||
incremental = FALSE, | ||
incrementalFolder = file.path(exportFolder, "incremental"), | ||
useExternalConceptCountsTable = FALSE, | ||
runFeatureExtractionOnSample = FALSE, | ||
sampleN = 1000, | ||
seed = 64374, | ||
|
@@ -687,6 +692,37 @@ executeDiagnostics <- function(cohortDefinitionSet, | |
} | ||
) | ||
} | ||
|
||
# Defines variables and checks version of external concept counts table ----- | ||
if (!useExternalConceptCountsTable) { | ||
conceptCountsTableIsTemp <- TRUE | ||
if (conceptCountsTable != "#concept_counts") { | ||
conceptCountsTable <- "#concept_counts" | ||
} | ||
} else { | ||
if (conceptCountsTable == "#concept_counts") { | ||
stop("Temporary conceptCountsTable name. Please provide a valid external ConceptCountsTable name") | ||
} | ||
conceptCountsTableIsTemp <- FALSE | ||
conceptCountsTable <- conceptCountsTable | ||
dataSourceInfo <- getCdmDataSourceInformation(connection = connection, cdmDatabaseSchema = cdmDatabaseSchema) | ||
vocabVersion <- dataSourceInfo$vocabularyVersion | ||
vocabVersionExternalConceptCountsTable <- renderTranslateQuerySql( | ||
connection = connection, | ||
sql = "SELECT DISTINCT vocabulary_version FROM @work_database_schema.@concept_counts_table;", | ||
work_database_schema = cohortDatabaseSchema, | ||
concept_counts_table = conceptCountsTable, | ||
snakeCaseToCamelCase = TRUE, | ||
tempEmulationSchema = getOption("sqlRenderTempEmulationSchena") | ||
) | ||
if (!identical(vocabVersion, vocabVersionExternalConceptCountsTable[1,1])) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. great check! |
||
stop(paste0("External concept counts table (", | ||
vocabVersionExternalConceptCountsTable, | ||
") does not match database (", | ||
vocabVersion, | ||
"). Update concept_counts with createConceptCountsTable()")) | ||
} | ||
} | ||
|
||
# Always export concept sets to csv | ||
exportConceptSets( | ||
|
@@ -719,11 +755,11 @@ executeDiagnostics <- function(cohortDefinitionSet, | |
exportFolder = exportFolder, | ||
minCellCount = minCellCount, | ||
conceptCountsDatabaseSchema = NULL, | ||
conceptCountsTable = "#concept_counts", | ||
conceptCountsTable = conceptCountsTable, | ||
conceptCountsTableIsTemp = TRUE, | ||
cohortDatabaseSchema = cohortDatabaseSchema, | ||
cohortTable = cohortTable, | ||
useExternalConceptCountsTable = FALSE, | ||
useExternalConceptCountsTable = useExternalConceptCountsTable, | ||
incremental = incremental, | ||
conceptIdTable = "#concept_ids", | ||
recordKeepingFile = recordKeepingFile | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be
substr(conceptCountsTable, 1, 1) == "#"
as otherwise#my_temp_table
will render aswork_database_schema.#my_temp_table
which will crash on some systems