-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add processors for drug indications and side effects #40
Conversation
print("Mapping out of UMLS") | ||
print(tabulate(biomappings_from_umls.most_common())) | ||
print("Mapping into UMLS") | ||
print(tabulate(biomappings_to_umls.most_common())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative approach here could be to propagate these mappings into INDRA (see https://github.com/sorgerlab/indra/blob/master/indra/resources/biomappings.tsv) and add them to the BioOntology there, then the usual standardization procedure would take care of the necessary mappings. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think this was ultimately unnecessary since I rely on the BioOntology standardization after. I guess I wonder if it's handling UMLS xrefs at the moment imported from biomappings. I thought that there were a lot of namespaces that got excluded from the importer, but after rereading https://github.com/sorgerlab/indra/blob/7ecb6be4603d54d55eaf7da8d3687294362907f5/indra/resources/update_resources.py#L699-L780, I think that's not the case.
chemical.db_id, | ||
indication.db_ns, | ||
indication.db_id, | ||
"causes", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... the fact that these are explicitly causal makes me wonder if this should be implemented as an INDRA source producing e.g., Activation statements. It might require more work but something worth discussing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to either. I thought for the time being, we were trying to keep a more molecular focus in INDRA, but the indirect causal nature of this information could also fit within INDRA statements. One thing I didn't include here is the side effect frequencies, maybe the fact that these exist on a spectrum might also be important to consider
The test failure says
is a new module missing from version control? |
Nice typo catch. I fixed it in 852eb40. |
852eb40
to
4cb39e4
Compare
This is already covered by the harness
f2a195c
to
a002a17
Compare
Closes #39
This PR adds a processor over the ChEMBL database based on SQL queries executed by the
chembl_downloader
. It standardizes the ChEMBL identifiers for chemicals and the MeSH identifiers for indications using the INDRA BioOntology. The code that does this is probably generally reusable so I gave it a its own utility function.This PR also adds the processor for SIDER side effects since it shares some of the standardization code from the ChEMBL processor