Add processors for drug indications and side effects #40

cthoyt · 2021-10-13T11:59:41Z

Closes #39

This PR adds a processor over the ChEMBL database based on SQL queries executed by the chembl_downloader. It standardizes the ChEMBL identifiers for chemicals and the MeSH identifiers for indications using the INDRA BioOntology. The code that does this is probably generally reusable so I gave it a its own utility function.

This PR also adds the processor for SIDER side effects since it shares some of the standardization code from the ChEMBL processor

bgyori · 2021-10-21T18:29:10Z

src/indra_cogex/sources/sider/__init__.py

+        print("Mapping out of UMLS")
+        print(tabulate(biomappings_from_umls.most_common()))
+        print("Mapping into UMLS")
+        print(tabulate(biomappings_to_umls.most_common()))


An alternative approach here could be to propagate these mappings into INDRA (see https://github.com/sorgerlab/indra/blob/master/indra/resources/biomappings.tsv) and add them to the BioOntology there, then the usual standardization procedure would take care of the necessary mappings. What do you think?

Yeah I think this was ultimately unnecessary since I rely on the BioOntology standardization after. I guess I wonder if it's handling UMLS xrefs at the moment imported from biomappings. I thought that there were a lot of namespaces that got excluded from the importer, but after rereading https://github.com/sorgerlab/indra/blob/7ecb6be4603d54d55eaf7da8d3687294362907f5/indra/resources/update_resources.py#L699-L780, I think that's not the case.

bgyori · 2021-10-21T18:30:59Z

src/indra_cogex/sources/sider/__init__.py

+                chemical.db_id,
+                indication.db_ns,
+                indication.db_id,
+                "causes",


Hmm... the fact that these are explicitly causal makes me wonder if this should be implemented as an INDRA source producing e.g., Activation statements. It might require more work but something worth discussing.

I'm open to either. I thought for the time being, we were trying to keep a more molecular focus in INDRA, but the indirect causal nature of this information could also fit within INDRA statements. One thing I didn't include here is the side effect frequencies, maybe the fact that these exist on a spectrum might also be important to consider

bgyori · 2021-10-21T18:32:42Z

The test failure says

.tox/py/lib/python3.9/site-packages/indra_cogex/sources/sider/__init__.py:17: in <module>
    from indra_cogex.constants import MODULE
E   ModuleNotFoundError: No module named 'indra_cogex.constants'

is a new module missing from version control?

cthoyt · 2021-10-22T10:58:10Z

The test failure says

.tox/py/lib/python3.9/site-packages/indra_cogex/sources/sider/__init__.py:17: in <module>
    from indra_cogex.constants import MODULE
E   ModuleNotFoundError: No module named 'indra_cogex.constants'

is a new module missing from version control?

Nice typo catch. I fixed it in 852eb40.

This is already covered by the harness

cthoyt changed the title ~~Add processor for ChEMBL indications~~ Add processors for drug indications and side effects Oct 21, 2021

bgyori reviewed Oct 21, 2021

View reviewed changes

cthoyt mentioned this pull request Oct 22, 2021

Update Biomappings sorgerlab/indra#1340

Merged

bgyori force-pushed the add-chembl-indications branch from 852eb40 to 4cb39e4 Compare October 27, 2021 18:21

cthoyt and others added 11 commits October 27, 2021 15:56

Add processor for chembl indications

ef5eefa

Make standardize function generally reusable

0fd3329

Add helper initializer in Node class to cleanup similar code

351848a

Remove redundant tqdm

4743c9d

This is already covered by the harness

Add max_phase annotation

f71ae6a

Update __init__.py

0218d68

Add SIDER processor

97f8eb4

Add biomappings as a dependency

6f9fcc2

Use joint name/id standardization

e726f94

Fix import

67e3068

Fix typo

a002a17

bgyori force-pushed the add-chembl-indications branch from f2a195c to a002a17 Compare October 27, 2021 19:56

bgyori added 2 commits October 29, 2021 20:52

Update relation types and fix PubChem IDs

01277ff

Add additional lookup for PubChem names

048f5fc

bgyori merged commit 3a52fbd into main Oct 30, 2021

bgyori deleted the add-chembl-indications branch October 30, 2021 20:31

bgyori mentioned this pull request Oct 30, 2021

Add processor for side effects and indications in SIDER #41

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add processors for drug indications and side effects #40

Add processors for drug indications and side effects #40

cthoyt commented Oct 13, 2021 •

edited

Loading

bgyori Oct 21, 2021

cthoyt Oct 22, 2021

bgyori Oct 21, 2021

cthoyt Oct 22, 2021

bgyori commented Oct 21, 2021

cthoyt commented Oct 22, 2021

Add processors for drug indications and side effects #40

Add processors for drug indications and side effects #40

Conversation

cthoyt commented Oct 13, 2021 • edited Loading

bgyori Oct 21, 2021

Choose a reason for hiding this comment

cthoyt Oct 22, 2021

Choose a reason for hiding this comment

bgyori Oct 21, 2021

Choose a reason for hiding this comment

cthoyt Oct 22, 2021

Choose a reason for hiding this comment

bgyori commented Oct 21, 2021

cthoyt commented Oct 22, 2021

cthoyt commented Oct 13, 2021 •

edited

Loading