Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make consistent unit concept exception in Concept Record Completeness #558

Open
MaximMoinat opened this issue Jul 12, 2024 · 2 comments
Open

Comments

@MaximMoinat
Copy link
Collaborator

This issue came up while reviewing the documentation of sourceConceptRecordCompleteness check.

In the query for both standard and source concept id, we have this exception for unit concepts:

WHERE cdmTable.@cdmFieldName = 0 {@cdmFieldName == 'UNIT_CONCEPT_ID' & (@cdmTableName == 'MEASUREMENT' | @cdmTableName == 'OBSERVATION')}?{AND cdmTable.value_as_number IS NOT NULL}

The reason we have this rule is that many databases will enter a 0 as the unit_concept_id by default, even if there is no measurement/observation value. These should be ignored to get a meaningful violating percentage. (fyi: the correct way to do this is to leave it empty, NULL, as the unit concept is not a required field).

However, the rule is inconsistent; we don't apply the exception to the unit_source_concept_id and not to the device and specimen table which also have a unit_concept_id and unit_source_concept_id. For these tables, we can assume it refers to the quantity field (instead of value_as_number).

We need to update the query to be consistent. It becomes a bit messy, but this would be the new where-clause.

WHERE cdmTable.@cdmFieldName = 0 
{@cdmTableName IN ('MEASUREMENT', 'OBSERVATION') & @cdmFieldName IN ('UNIT_CONCEPT_ID', 'UNIT_SOURCE_CONCEPT_ID')}?{AND cdmTable.value_as_number IS NOT NULL}
{@cdmTableName IN ('DEVICE_EXPOSURE', 'SPECIMEN') & @cdmFieldName IN ('UNIT_CONCEPT_ID', 'UNIT_SOURCE_CONCEPT_ID')}?{AND cdmTable.quantity IS NOT NULL}
@MaximMoinat
Copy link
Collaborator Author

MaximMoinat commented Jul 12, 2024

That said, after thinking this through for a bit, the issue is actually that the unit concept is just not properly populated. The following table covers all options:

Value ✅ Value ❌
Unit ✅ A 👍 B 🚫
Unit 0️⃣ C ‼️ D 🚫
Unit ❌ E 🚫 F 👍

✅ - non-null
❌ - null
👍 - expected
🚫 - unexpected, should not happen

B and D should not happen (unit while there is no value). What we currently correct for, is D; default unit_concept_id to 0 even if the measurement/observation has no value. We might want to implement a separate check for these cases.

What we are currently doing:
$\frac{A}{A+C+E}$

What we should do:
$\frac{A}{A+C}$

And then we can make this consistent for all non-required _concept_id fields. i.e. if not required, do not count empty _concept_id fields in numerator.

@katy-sadowski
Copy link
Collaborator

katy-sadowski commented Jul 15, 2024

Thanks so much for the thorough summary @MaximMoinat ! I agree with your assessment. I think we might also want to consider adding a separate check for cases B, D, and E. I observe inconsistencies in these fields often and it's confusing. We should be enforcing that non-required concept fields are left NULL when there is no data with which to populate a concept. This way users can distinguish no data from unmappable data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants