Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

removed unnecessary join of observation fact onto a subquery #13

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

namdets
Copy link

@namdets namdets commented May 13, 2018

Since the subquery is derived solely from observation fact I was able to refactor the addition of observation blobs to happen in the former subquery instead of being joined in the outer query.

Jason Stedman added 2 commits May 12, 2018 21:23
…solely from observation fact by refactoring the addition of observation blobs to happen in the former subquery instead of being joined in
@lcphillips2
Copy link
Contributor

Jason,
Can you show me what the final query (mainQuerySql) looks like after this change?
thanks -Lori

@namdets
Copy link
Author

namdets commented May 16, 2018

Lori,

I'll pull some logs today and post the queries here for your review.

Thanks,

Jason

Jason Stedman added 3 commits May 16, 2018 11:08
…derived solely from observation fact by refactoring the addition of observation blobs to happen in the former subquery instead of being joined in"

This reverts commit 9b64588.
…solely from observation fact when no blob data is being pulled in, a more complicated refactor is required to make the same performance improvement for queries involving observation_blobs
@namdets
Copy link
Author

namdets commented May 16, 2018

Hi Lori,

I have reverted the original fix as it broke queries that include observation_blob data. Our PIC-SURE API currently doesn't support observation_blobs, so no queries had been generated that exercised that code path. The new refactor limits the change to only affecting queries that do not involve observation_blobs.

If you run this XML query:


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns4:request xmlns:ns2="http://www.i2b2.org/xsd/cell/crc/pdo/1.1/" xmlns:ns4="http://www.i2b2.org/xsd/hive/msg/1.1/" xmlns:ns3="http://www.i2b2.org/xsd/hive/pdo/1.1/">
    <message_header>
        <sending_application>
            <application_name>IRCT</application_name>
            <application_version>1.0</application_version>
        </sending_application>
        <sending_facility>
            <facility_name>IRCT</facility_name>
        </sending_facility>
        <security>
            <domain>i2b2demo</domain>
            <username>demo</username>
            <password>demouser</password>
        </security>
        <project_id>Demo</project_id>
    </message_header>
    <request_header>
        <result_waittime_ms>180000</result_waittime_ms>
    </request_header>
    <message_body>
        <ns2:pdoheader>
            <patient_set_limit>0</patient_set_limit>
            <estimated_time>180000</estimated_time>
            <request_type>getPDO_fromInputList</request_type>
        </ns2:pdoheader>
        <ns2:request xsi:type="ns2:GetPDOFromInputList_requestType" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <input_list>
                <patient_list min="0" max="100000">
                    <patient_set_coll_id>6</patient_set_coll_id>
                </patient_list>
            </input_list>
            <filter_list>
                <panel name="\\examination\examination\body measures\Head Circumference (cm)\">
                    <panel_number>0</panel_number>
                    <panel_timing>ANY</panel_timing>
                    <panel_accuracy_scale>0</panel_accuracy_scale>
                    <invert>0</invert>
                    <total_item_occurrences>1</total_item_occurrences>
                    <item>
                        <hlevel>0</hlevel>
                        <item_name>body measures/Head Circumference (cm)</item_name>
                        <item_key>\\examination\examination\body measures\Head Circumference (cm)\</item_key>
                        <item_is_synonym>false</item_is_synonym>
                    </item>
                </panel>
            </filter_list>
            <output_option>
                <observation_set onlykeys="false" blob="false" techdata="false"/>
                <patient_set onlykeys="false" select="using_filter_list"/>
                <event_set onlykeys="true" select="using_filter_list"/>
                <concept_set_using_filter_list onlykeys="false"/>
                <modifier_set_using_filter_list onlykeys="true"/>
                <pid_set onlykeys="true" select="using_filter_list"/>
                <eid_set onlykeys="true" select="using_filter_list"/>
            </output_option>
        </ns2:request>
    </message_body>
</ns4:request>

A resulting SQL queries look like this:

INSERT INTO GLOBAL_TEMP_FACT_PARAM_TABLE (char_param1)
SELECT DISTINCT
obs_modifier_cd
FROM ((
SELECT
a.*
FROM (SELECT
obs.encounter_num obs_encounter_num,
obs.patient_num obs_patient_num,
obs.concept_cd obs_concept_cd,
obs.provider_id obs_provider_id,
obs.start_date obs_start_date,
obs.modifier_cd obs_modifier_cd,
obs.instance_num obs_instance_num,
obs.valtype_cd obs_valtype_cd,
obs.tval_char obs_tval_char,
obs.nval_num obs_nval_num,
obs.valueflag_cd obs_valueflag_cd,
obs.quantity_num obs_quantity_num,
obs.units_cd obs_units_cd,
obs.end_date obs_end_date,
obs.location_cd obs_location_cd,
obs.confidence_num obs_confidence_num,
'\examination\examination\body measures\Head Circumference (cm)' panel_name
FROM (SELECT
CONCEPT_CD
FROM i2b2demodata.CONCEPT_DIMENSION
WHERE CONCEPT_PATH LIKE '\examination\body measures\Head Circumference (cm)%'
GROUP BY CONCEPT_CD) dimension,
i2b2demodata.observation_FACT obs
WHERE obs.patient_num IN (SELECT
pset.patient_num
FROM i2b2demodata.qt_patient_set_collection pset
WHERE pset.result_instance_id = ?
AND pset.set_index BETWEEN 0 AND 100000)
AND obs.CONCEPT_CD = dimension.CONCEPT_CD
ORDER BY 2, 5, 3, 7, 6) a
)) b

If you run the following XML which includes the observation_blob :

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns4:request xmlns:ns2="http://www.i2b2.org/xsd/cell/crc/pdo/1.1/" xmlns:ns4="http://www.i2b2.org/xsd/hive/msg/1.1/" xmlns:ns3="http://www.i2b2.org/xsd/hive/pdo/1.1/">
    <message_header>
        <sending_application>
            <application_name>IRCT</application_name>
            <application_version>1.0</application_version>
        </sending_application>
        <sending_facility>
            <facility_name>IRCT</facility_name>
        </sending_facility>
        <security>
            <domain>i2b2demo</domain>
            <username>demo</username>
            <password>demouser</password>
        </security>
        <project_id>Demo</project_id>
    </message_header>
    <request_header>
        <result_waittime_ms>180000</result_waittime_ms>
    </request_header>
    <message_body>
        <ns2:pdoheader>
            <patient_set_limit>0</patient_set_limit>
            <estimated_time>180000</estimated_time>
            <request_type>getPDO_fromInputList</request_type>
        </ns2:pdoheader>
        <ns2:request xsi:type="ns2:GetPDOFromInputList_requestType" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <input_list>
                <patient_list min="0" max="100000">
                    <patient_set_coll_id>6</patient_set_coll_id>
                </patient_list>
            </input_list>
            <filter_list>
                <panel name="\\examination\examination\body measures\Head Circumference (cm)\">
                    <panel_number>0</panel_number>
                    <panel_timing>ANY</panel_timing>
                    <panel_accuracy_scale>0</panel_accuracy_scale>
                    <invert>0</invert>
                    <total_item_occurrences>1</total_item_occurrences>
                    <item>
                        <hlevel>0</hlevel>
                        <item_name>body measures/Head Circumference (cm)</item_name>
                        <item_key>\\examination\examination\body measures\Head Circumference (cm)\</item_key>
                        <item_is_synonym>false</item_is_synonym>
                    </item>
                </panel>
            </filter_list>
            <output_option>
                <observation_set onlykeys="false" blob="true" techdata="false"/>
                <patient_set onlykeys="false" select="using_filter_list"/>
                <event_set onlykeys="true" select="using_filter_list"/>
                <concept_set_using_filter_list onlykeys="false"/>
                <modifier_set_using_filter_list onlykeys="true"/>
                <pid_set onlykeys="true" select="using_filter_list"/>
                <eid_set onlykeys="true" select="using_filter_list"/>
            </output_option>
        </ns2:request>
    </message_body>
</ns4:request>

The resulting SQL queries are unchanged and look like this:

INSERT INTO i2b2demodata.GLOBAL_TEMP_FACT_PARAM_TABLE (char_param1)
SELECT DISTINCT
obs_encounter_num
FROM (SELECT
a.,
observation_blob obs_observation_blob
FROM i2b2demodata.observation_FACT obs,
(SELECT
a.

FROM (SELECT
obs.encounter_num obs_encounter_num,
obs.patient_num obs_patient_num,
obs.concept_cd obs_concept_cd,
obs.provider_id obs_provider_id,
obs.start_date obs_start_date,
obs.modifier_cd obs_modifier_cd,
obs.instance_num obs_instance_num,
obs.valtype_cd obs_valtype_cd,
obs.tval_char obs_tval_char,
obs.nval_num obs_nval_num,
obs.valueflag_cd obs_valueflag_cd,
obs.quantity_num obs_quantity_num,
obs.units_cd obs_units_cd,
obs.end_date obs_end_date,
obs.location_cd obs_location_cd,
obs.confidence_num obs_confidence_num,
'\examination\examination\body measures\Head Circumference (cm)' panel_name
FROM (SELECT
CONCEPT_CD
FROM i2b2demodata.CONCEPT_DIMENSION
WHERE CONCEPT_PATH LIKE '\examination\body measures\Head Circumference (cm)%'
GROUP BY CONCEPT_CD) dimension,
i2b2demodata.observation_FACT obs
WHERE obs.patient_num IN (SELECT
pset.patient_num
FROM i2b2demodata.qt_patient_set_collection pset
WHERE pset.result_instance_id = ?
AND pset.set_index BETWEEN 0 AND 100000)
AND obs.CONCEPT_CD = dimension.CONCEPT_CD
ORDER BY 2, 5, 3, 7, 6) a) a
WHERE obs.encounter_num = a.obs_encounter_num
AND obs.patient_num = a.obs_patient_num
AND obs.concept_cd = a.obs_concept_cd
AND obs.provider_id = a.obs_provider_id
AND obs.start_date = a.obs_start_date
AND obs.modifier_cd = a.obs_modifier_cd
AND obs.instance_num = a.obs_instance_num) b

The performance hit comes from evaluating all the where clauses for the join. This can be avoided for the observation_blob including code paths also by adjusting the inner-most query, but the logic to do so is much more complicated because of the way the SQL is generated. I'd like to (in all that spare time I have ;D ) take a crack at refactoring how the total SQL is generated for this DAO to be template based, but that will have to wait for another day as it would be a substantial effort.

There is some significant potential for further performance improvement as the same SQL query seems to be run 5 times in the process of handling a single XML request. This would a much larger architectural change.

Thanks for reviewing my pull request and making me fix it! Any feedback is always appreciated!

Jason

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants