Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Achilles Generating Incorrect Column Name in SQL Scripts - vo.discharge_to_concept_id #757

Open
Zachary-Higgins opened this issue Mar 28, 2024 · 2 comments

Comments

@Zachary-Higgins
Copy link

Hi - We're setting up Achilles to run on top of our delta lake through Databricks. I was having issues running the analysis through R, so I generated the SQL scripts and copied them over to Databricks. The generated script failed to complete when referencing the discharge_to_concept_id field because it was missing. In actuality, it looks like it changed from discharge to discharged.

https://ohdsi.github.io/CommonDataModel/cdm54.html#visit_occurrence

We're using OMOP CDM 5.4.1. It looks like this issue was reported last year in May, but has been closed and is still present. #702

Here's the r script we used to generate the queries. I think it's pretty standard....

# Install Packages
if (!require("remotes")) install.packages("remotes")
remotes::install_github("OHDSI/Achilles", force=TRUE)
install.packages("DatabaseConnector")
library("DatabaseConnector")

# Download Drivers
Sys.setenv(DATABASECONNECTOR_JAR_FOLDER = "c:\\temp")
downloadJdbcDrivers("spark")

# Set Connection
connectionDetails <- createConnectionDetails(
  dbms="spark", 
  connectionString="",
  user="token", 
  password="")

options(connectionObserver = NULL)

library(Achilles)
Achilles::achilles(
  cdmVersion = "5.4.1", 
  connectionDetails = connectionDetails,
  cdmDatabaseSchema = "REMOVED",
  resultsDatabaseSchema = "REMOVED",
  dropScratchTables = TRUE,
  createIndices = FALSE,
  createTable = TRUE,
  numThreads = 10,
  sqlOnly = TRUE
)

And an example of the incorrect column reference in the generated SQL.

SELECT
1203 AS analysis_id,
 CAST(vo.discharge_to_concept_id AS STRING) AS stratum_1,
 CAST(NULL AS STRING) AS stratum_2,
 CAST(NULL AS STRING) AS stratum_3,
 CAST(NULL AS STRING) AS stratum_4,
 CAST(NULL AS STRING) AS stratum_5,
 COUNT(*) AS count_value
FROM
hive_metastore.omop.visit_occurrence vo
JOIN 
 hive_metastore.omop.observation_period op 
ON 
 vo.person_id = op.person_id
AND 
 vo.visit_start_date >= op.observation_period_start_date
AND 
 vo.visit_start_date <= op.observation_period_end_date
WHERE 
 vo.discharge_to_concept_id != 0
GROUP BY 
 vo.discharge_to_concept_id;
OPTIMIZE hive_metastore.omop.bajhxbnws_tmpach_1203
 ZORDER BY stratum_1;
@Zachary-Higgins
Copy link
Author

Zachary-Higgins commented Mar 28, 2024

It also generated the following incorrect column reference:

[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name admitting_source_concept_id cannot be resolved. Did you mean one of the following? [visit_source_concept_id, admitted_from_concept_id, visit_type_concept_id, discharged_to_concept_id, visit_concept_id].; line 65 pos 197

Assuming admitting_source_concept_id should actually be admitted_from_concept_id.

Should we create a new issue for each or keep a list here?

@paulstey
Copy link

@Zachary-Higgins : Thanks for flagging! What did you end up doing as your workaround? Did you manually correct the emitted SQL and re-run?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants