Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add processors for drug indications and side effects #40

Merged
merged 13 commits into from
Oct 30, 2021

Conversation

cthoyt
Copy link
Member

@cthoyt cthoyt commented Oct 13, 2021

Closes #39

This PR adds a processor over the ChEMBL database based on SQL queries executed by the chembl_downloader. It standardizes the ChEMBL identifiers for chemicals and the MeSH identifiers for indications using the INDRA BioOntology. The code that does this is probably generally reusable so I gave it a its own utility function.

This PR also adds the processor for SIDER side effects since it shares some of the standardization code from the ChEMBL processor

@cthoyt cthoyt changed the title Add processor for ChEMBL indications Add processors for drug indications and side effects Oct 21, 2021
print("Mapping out of UMLS")
print(tabulate(biomappings_from_umls.most_common()))
print("Mapping into UMLS")
print(tabulate(biomappings_to_umls.most_common()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative approach here could be to propagate these mappings into INDRA (see https://github.com/sorgerlab/indra/blob/master/indra/resources/biomappings.tsv) and add them to the BioOntology there, then the usual standardization procedure would take care of the necessary mappings. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think this was ultimately unnecessary since I rely on the BioOntology standardization after. I guess I wonder if it's handling UMLS xrefs at the moment imported from biomappings. I thought that there were a lot of namespaces that got excluded from the importer, but after rereading https://github.com/sorgerlab/indra/blob/7ecb6be4603d54d55eaf7da8d3687294362907f5/indra/resources/update_resources.py#L699-L780, I think that's not the case.

chemical.db_id,
indication.db_ns,
indication.db_id,
"causes",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... the fact that these are explicitly causal makes me wonder if this should be implemented as an INDRA source producing e.g., Activation statements. It might require more work but something worth discussing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to either. I thought for the time being, we were trying to keep a more molecular focus in INDRA, but the indirect causal nature of this information could also fit within INDRA statements. One thing I didn't include here is the side effect frequencies, maybe the fact that these exist on a spectrum might also be important to consider

@bgyori
Copy link
Member

bgyori commented Oct 21, 2021

The test failure says

.tox/py/lib/python3.9/site-packages/indra_cogex/sources/sider/__init__.py:17: in <module>
    from indra_cogex.constants import MODULE
E   ModuleNotFoundError: No module named 'indra_cogex.constants'

is a new module missing from version control?

@cthoyt
Copy link
Member Author

cthoyt commented Oct 22, 2021

The test failure says

.tox/py/lib/python3.9/site-packages/indra_cogex/sources/sider/__init__.py:17: in <module>
    from indra_cogex.constants import MODULE
E   ModuleNotFoundError: No module named 'indra_cogex.constants'

is a new module missing from version control?

Nice typo catch. I fixed it in 852eb40.

@bgyori bgyori force-pushed the add-chembl-indications branch from 852eb40 to 4cb39e4 Compare October 27, 2021 18:21
@bgyori bgyori force-pushed the add-chembl-indications branch from f2a195c to a002a17 Compare October 27, 2021 19:56
@bgyori bgyori merged commit 3a52fbd into main Oct 30, 2021
@bgyori bgyori deleted the add-chembl-indications branch October 30, 2021 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ChEMBL Indication Processor
2 participants