Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MOA disease unique contraint failed #409

Closed
korikuzma opened this issue Dec 12, 2024 · 4 comments
Closed

MOA disease unique contraint failed #409

korikuzma opened this issue Dec 12, 2024 · 4 comments
Assignees
Labels
bug Something isn't working priority:high High priority

Comments

@korikuzma
Copy link
Member

In #408, I am working on adding support for MOA prognostic data. This resulted in being able to harvest/transform more MOA assertions. During this transformation, I encountered a bug with MOA disease normalization resolution.

{
        "name": "Myelodysplastic Syndromes",
        "oncotree_code": "MDS",
        "oncotree_term": "Myelodysplastic Syndromes"
      }

and

{
        "name": "Myelodysplasia",
        "oncotree_code": "MDS",
        "oncotree_term": "Myelodysplasia"
      }

resolve to the same NCIt ID, however we create two separate Disease objects in MoaTransformer with the same id="moa.normalize.disease.ncit:C3247", which fails to satisfy the db uniqueness constraint.

Since MOA does not have disease "records" we will have to change how we're processing diseases. One thing I'm not sure about is which value to use for the label vs alternativeLabels. Typically, we only store normalizer information in the vicc_normalizer_data field.

@korikuzma korikuzma added bug Something isn't working priority:high High priority labels Dec 12, 2024
@korikuzma korikuzma self-assigned this Dec 12, 2024
@jsstevenson
Copy link
Member

This looks like a data curation error on their part. Same oncotree code.

@korikuzma
Copy link
Member Author

Ya, I was going back on forth on whether we should handle this or see if MOA could. I drafted a message to Brendan asking if they did any normalization on their side but didn't send and just ended up creating issue here and not sending. I can reach out to him to ask.

@korikuzma
Copy link
Member Author

we just use oncotree for the normalization! the “name” is as written in the source text. we used an underscore in the upcoming release to differentiate “raw” values per a suggestion from James last year

korikuzma added a commit that referenced this issue Dec 13, 2024
close #409

* Internal digest is now created using `oncotree_code` or `oncotree_term`
* Since there may be duplicate codes or terms with different source text disease names, the first record will be used as the `Disease` label and others will be added to `alternativeLabels`
korikuzma added a commit that referenced this issue Dec 18, 2024
close #408 

Note: This work revealed a bug with uniqueness constraints (#409). This
will be addressed in a separate PR. This PR focuses on changes to the
harvester + transformer: `python3 -m pytest tests/unit/harvesters
tests/unit/transformers`

* Harvester output changed
  * Therapy fields are now nested inside `therapy` key
  * Remove `clinical_significance` and retain original values from MOA
  * `source_ids` -> `source_id` since we only store one ID
* `MoaTransformer` now supports MOA prognostic assertions
korikuzma added a commit that referenced this issue Dec 18, 2024
close #409

* Internal digest is now created using `oncotree_code` or
`oncotree_term`
* Since there may be duplicate codes or terms with different source text
disease names, the first record will be used as the `Disease` label and
others will be added to `alternativeLabels`
* Also fixed return type annotation in `_get_disease`
Copy link

Closed by #413.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:high High priority
Projects
None yet
Development

No branches or pull requests

2 participants