Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DC-3271] Use Jinja template instead of string concatenation when importing from RDR dataset #1691

Open
wants to merge 27 commits into
base: develop
Choose a base branch
from
Open
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
a8317d4
[DC-3271] saving WIP
Aug 2, 2023
d73c59f
[DC-3271] Add new template, use new template
Aug 3, 2023
5038819
[DC-3271] Remove testing setup
Aug 3, 2023
0fd38e3
Merge branch 'develop' into ms/dc-3271
Sep 11, 2023
a953f63
[DC-3271] Merge branch 'develop'
Nov 7, 2023
61401fd
[DC-3271] Merge develop
Nov 8, 2023
c6657e1
[DC-3271] merge in develop
Nov 9, 2023
469ffec
[DC-3271] update #! top-level statement, bring in initial Jinja template
Nov 9, 2023
9d2831e
[DC-3271] Add "AS" clause, formatting
Nov 9, 2023
2630fd2
[DC-3271] YAPF
Nov 9, 2023
c70198b
[DC-3271] merge in develop
Nov 10, 2023
48bd3e6
[DC-3271] Merge in develop
Dec 19, 2023
0e72e1f
[DC-3271] Merge in develop
Dec 19, 2023
a583bb9
[DC-3629] Add the wear_study percentage with fitbit data check (#1833)
brendagutman Dec 20, 2023
4baf50d
[DC-3635] Remove portion of query checking primary consent (#1834)
Dec 21, 2023
aedfc7d
[DC-3631] Update cope_survey to reduce false positive results due to …
Dec 21, 2023
02ca7cc
[DC-3658] Update Participant Validation QC check for excluded sites (…
ratuagga Dec 26, 2023
271a036
[DC-3659] Update check controlled tier part 2 to include standard cla…
ratuagga Dec 26, 2023
44983fc
[DC-3650] Include PII Validation in Snapshot script (#1837)
ratuagga Dec 26, 2023
1623312
[DC-3651] Fix Combined Backup script for git_version (#1835)
ratuagga Dec 27, 2023
9e9033e
[DC-3333] Python 3.11 upgrade (#1798)
hiro-mishima Jan 16, 2024
ff26387
[DC-3660] Update Combined QC notebook to include CE (#1838)
ratuagga Jan 16, 2024
d6efe80
[DC-3673] Remove notebook check regarding cause_source_concept_id for…
Jan 16, 2024
8750581
[DC-3661] Update `fitbit_qc` parameters and descriptions (#1843)
brendagutman Jan 17, 2024
dbec3b7
[DC-3668] Ignore race/ethnicity sub categories in ct notebook (#1846)
brendagutman Jan 17, 2024
9961da6
[DC-3271] Remove irrelevant bq_utils import statements (#1841)
Jan 22, 2024
70e9d56
[DC-3271] Merge in develop
Feb 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 17 additions & 15 deletions data_steward/tools/import_rdr_dataset.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env bash
#!/usr/bin/env python

# Imports RDR ETL results into a dataset in BigQuery.
# Assumes you have already activated a service account that is able to
Expand All @@ -17,7 +17,7 @@
# Project imports
from utils import auth, pipeline_logging
from gcloud.bq import BigQueryClient
from common import CDR_SCOPES, AOU_DEATH, DEATH
from common import CDR_SCOPES, AOU_DEATH, DEATH, JINJA_ENV
from resources import (replace_special_characters_for_labels,
validate_date_string, rdr_src_id_schemas, cdm_schemas,
fields_for, rdr_specific_schemas)
Expand Down Expand Up @@ -99,11 +99,11 @@ def create_rdr_tables(client, destination_dataset, rdr_project,

Uses the client to load data directly from the dataset into
a table.

NOTE: Death records are loaded to AOU_DEATH table. We do not
create DEATH table here because RDR's death records contain
NULL death_date records, which violates CDM's DEATH definition.
We assign `aou_death_id` using UUID on the fly.
We assign `aou_death_id` using UUID on the fly.
`primary_death_record` is set to FALSE here. The CR CalculatePrimaryDeathRecord
will update it to the right values later in the RDR data stage.

Expand Down Expand Up @@ -155,17 +155,19 @@ def create_rdr_tables(client, destination_dataset, rdr_project,
if table_ref.num_rows == 0:
raise NotFound(f'`{source_table_id}` has No data To copy from')

sc_list = []
for item in schema_list:
if item.name == 'aou_death_id':
field = 'GENERATE_UUID() AS aou_death_id'
elif item.name == 'primary_death_record':
field = 'FALSE AS primary_death_record'
else:
field = f'CAST({item.name} AS {BIGQUERY_DATA_TYPES[item.field_type.lower()]}) AS {item.name}'
sc_list.append(field)

fields_name_str = ',\n'.join(sc_list)
fields_name_str = JINJA_ENV.from_string("""
{% for item in schema_list %}
{% set name = item.name %}
{% set field_type = item.field_type %}
{% if name == 'aou_death_id' %}
GENERATE_UUID() AS aou_death_id,
{% elif name == 'primary_death_record' %}
FALSE AS primary_death_record,
{% else %}
CAST({{ name }} AS {{BIGQUERY_DATA_TYPES[field_type.lower()]}}) AS {{ name }}{{", " if not loop.last else "" }}
{% endif %}
{% endfor %}""").render(schema_list=schema_list,
BIGQUERY_DATA_TYPES=BIGQUERY_DATA_TYPES)

# copy contents from source dataset to destination dataset
if table == 'cope_survey_semantic_version_map':
Expand Down