Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAM KP does not respond to any of ICEES KG-derived input CURIES #101

Open
karafecho opened this issue Jun 21, 2023 · 17 comments
Open

CAM KP does not respond to any of ICEES KG-derived input CURIES #101

karafecho opened this issue Jun 21, 2023 · 17 comments
Assignees

Comments

@karafecho
Copy link

This issue is to report that CAM KP does not respond to any of the ICEES KG-derived CURIES in this sheet and also appended below. Is this expected behavior? Is this a normalization issue? Is this something else?

ICEES input CURIES for MVP2 queries ExposuresProvider/cam-kp-api#81 and ExposuresProvider/cam-kp-api#82 |  
ICEES query: What chemicals are associated with primary ciliary dyskinesia? |  
[Jupyter notebook](https://colab.research.google.com/drive/1CdO0XtUddVzt5bRzlcTs8tW0SFc2ggkP#scrollTo=ZVvv2zGGsr-4&uniqifier=1)

ENTITIES_OF_INTEREST
'PUBCHEM.COMPOUND:5865',            # Prednisone |  
'CHEMBL.COMPOUND:CHEMBL1256818',    # Dextromethorphan hydrobromide |  
'PUBCHEM.COMPOUND:165363555',       # Trifacta |  
"HMDB:HMDB0252416",                 # Fluticasone |  
"PUBCHEM.COMPOUND:123600",          # Levalbuterol |  
"HMDB:HMDB0242500",                 # Budesonide |  
"CHEBI:5147",                       # Formoterol |  
"CHEMBL.COMPOUND:CHEMBL158"         # Aztreonam |  
"PUBCHEM.COMPOUND:145068"           # Nitric oxide |  
"PUBCHEM.COMPOUND:281",             # Carbon monoxide |  

@gaurav gaurav transferred this issue from ExposuresProvider/cam-kp-api Jul 18, 2023
@gaurav
Copy link
Member

gaurav commented Nov 6, 2023

Thanks for this identifiers! I've added these identifiers to the brand new Automat-CAM-KP test suite (#111), and here are the results I have:

CURIE Normalized to How many unique CURIEs is this connected to in Automat-CAM-KP?
PUBCHEM.COMPOUND:5865 Normalized 30
CHEMBL.COMPOUND:CHEMBL1256818 PUBCHEM.COMPOUND:5462351 None
PUBCHEM.COMPOUND:165363555 Normalized None
HMDB:HMDB0252416 PUBCHEM.COMPOUND:2462 None
PUBCHEM.COMPOUND:123600 Normalized None
HMDB:HMDB0242500 PUBCHEM.COMPOUND:2462 None
CHEBI:5147 PUBCHEM.COMPOUND:3410 None
CHEMBL.COMPOUND:CHEMBL158 PUBCHEM.COMPOUND:5742832 9
PUBCHEM.COMPOUND:145068 Normalized 258
PUBCHEM.COMPOUND:281 Normalized 64

@gaurav
Copy link
Member

gaurav commented Nov 6, 2023

@balhoff Do you have thoughts on figuring out how to plug in the gaps we see here in node coverage? I'm guessing we need new data sources.

@karafecho
Copy link
Author

Thanks, @gaurav! While we don't have a 1:1 match between CURIEs, the matches that we do have are representative, with two drugs and two chemical exposures, and will allow us to move this effort along.

@karafecho
Copy link
Author

This Swagger example query runs successfully, but it returns 0 results. If I replace the input CURIES with PUBCHEM.COMPOUND:5865 from the table above, the query also runs successfully, but it returns 0 results. I think the Automat example queries are standardized and not tailored to the underlying KGs, so perhaps you can send me an example query that returns results from CAM KP? Thanks!

{
  "message": {
    "query_graph": {
      "nodes": {
        "n0": {
          "categories": [
            "biolink:ChemicalEntity"
          ],
          "ids": [
            "CHEMBL.COMPOUND:CHEMBL3234626",
            "CHEMBL.COMPOUND:CHEMBL3234633"
          ]
        },
        "n1": {
          "categories": [
            "biolink:GeneOrGeneProduct"
          ],
          "ids": [
            "NCBIGene:2099"
          ]
        }
      },
      "edges": {
        "e01": {
          "subject": "n0",
          "object": "n1",
          "predicates": [
            "biolink:affects"
          ],
          "qualifier_constraints": [
            {
              "qualifier_set": [
                {
                  "qualifier_type_id": "biolink:object_aspect_qualifier",
                  "qualifier_value": "activity"
                },
                {
                  "qualifier_type_id": "biolink:object_direction_qualifier",
                  "qualifier_value": "increased"
                },
                {
                  "qualifier_type_id": "biolink:qualified_predicate",
                  "qualifier_value": "biolink:causes"
                }
              ]
            }
          ]
        }
      }
    }
  },
  "workflow": [
    {
      "id": "lookup"
    }
  ]
}

@gaurav
Copy link
Member

gaurav commented Nov 6, 2023

Hi Kara! Sorry about the confusion: that Swagger example query can't currently be configured for individual platers, so we share a single Swagger with all the platers on Automat. That one isn't relevant to us, and has two main problems:

  1. CAM-KP doesn't currently know about CHEMBL.COMPOUND:CHEMBL3234626 or CHEMBL.COMPOUND:CHEMBL3234633. We'd have to ingest new pathways to provide information on them. We do have information on NCBIGene:2099.
  2. CAM-KP can't handle qualifiers until Generate RO-Biolink predicate mappings based on a particular Biolink model #104 has been incorporated, which we're hoping to do really soon!

So the following query will work:

{
  "message": {
    "query_graph": {
      "nodes": {
        "n0": {
          "categories": [
            "biolink:ChemicalEntity"
          ]
        },
        "n1": {
          "categories": [
            "biolink:GeneOrGeneProduct"
          ],
          "ids": [
            "NCBIGene:2099"
          ]
        }
      },
      "edges": {
        "e01": {
          "subject": "n0",
          "object": "n1",
          "predicates": [
            "biolink:affects"
          ]
        }
      }
    }
  },
  "workflow": [
    {
      "id": "lookup"
    }
  ]
}

@karafecho
Copy link
Author

karafecho commented Nov 6, 2023

No confusion, I was aware that the Swagger examples aren't really "examples" for most of the Automats, including cam-kp and icees-kg. Thanks for an actual example query!

@karafecho
Copy link
Author

This query returns results when sent directly to automat-icees-kg at https://automat.renci.org/#/.

{
  "message": {
    "query_graph": {
      "nodes": {
        "n0": {
          "categories": [
            "biolink:DiseaseOrPhenotypicFeature"
          ],
          "ids": [
            "MONDO:0009061"
          ]
        },
        "n1": {
          "categories": [
            "biolink:ChemicalEntity"
          ]
        }
      },
      "edges": {
        "e01": {
          "subject": "n0",
          "object": "n1",
          "predicates": [
            "biolink:correlated_with"
          ]
        }
      }
    }
  },
  "workflow": [
    {
      "id": "lookup"
    }
  ]
}

And this query returns responses when sent directly to automat-cam-kp at https://automat.renci.org/#/.

{
  "message": {
    "query_graph": {
      "nodes": {
        "n0": {
          "categories": [
            "biolink:ChemicalEntity"
          ],
          "ids": [
            "PUBCHEM.COMPOUND:5865"
          ]
        },
        "n1": {
          "categories": [
            "biolink:GeneOrGeneProduct"
          ]
        }
      },
      "edges": {
        "e01": {
          "subject": "n0",
          "object": "n1",
          "predicates": [
            "biolink:affects"
          ]
        }
      }
    }
  },
  "workflow": [
    {
      "id": "lookup"
    }
  ]
}

But this query, while able to run successfully, returns an empty response when sent to WFR at https://translator-workflow-runner.renci.org/docs#/trapi/run_workflow_query_post.

{
    "workflow": [
        {
            "id": "lookup"
        },
        {
            "id":"score"
        }
    ],
    "message": {
        "query_graph": {
            "edges": {
                "e0": {
                    "predicates": [
                        "biolink:correlated_with"
                    ],
                    "subject": "n0",
                    "object": "n1",
                    "provided_by": {
                        "allowlist": [
                            "infores:automat-icees-kg"
                        ]
                    }
                },
                "e1": {
                    "subject": "n1",
                    "object": "n2",
                    "predicates": [
                        "biolink:affects"
                    ],
                    "provided_by": {
                        "allowlist": [
                            "infores:automat-cam-kp"
                        ]
                    }
                }
            },
            "nodes": {
                "n0": {
                    "ids": [
                        "MONDO:0009061"
                    ],
                    "is_set": false
                },
                "n1": {
                    "categories": [
                        "biolink:ChemicalEntity"
                    ],
                    "is_set": false
                },
                "n2": {
                    "categories": [
                        "biolink:GeneOrGeneProduct"
                    ],
                    "is_set": false
                }
            }
        }
    }
}

@maximusunc
Copy link

This comes from going through ARAs that have strict kp timeouts vs sending queries directly to kps. I also wasn't able to get any results from the WFR, but sending directly to Aragorn with an extended timeout returns a 16.6MB response. 12k results in total. ICEES-KG took 35 seconds to respond to the first hop (normal timeout is 10s) and returned 106 results, and then CAM-KP took 90 seconds to respond with the 12k results. If you want, I can share entire response.

@karafecho
Copy link
Author

Thanks, Max.

Given your findings, then the revised query below should run when sent to WFR and return results. However, while it runs successfully, it returns an empty KG.

{
    "workflow": [
        {
            "id": "lookup",
            "runner_parameters": {
                "allowlist": ["infores:aragorn"]
            }
        },
        {
            "id":"score"
        }
    ],
    "message": {
        "query_graph": {
            "edges": {
                "e0": {
                    "predicates": [
                        "biolink:correlated_with"
                    ],
                    "subject": "n0",
                    "object": "n1",
                    "provided_by": {
                        "allowlist": [
                            "infores:automat-icees-kg"
                        ]
                    }
                },
                "e1": {
                    "subject": "n1",
                    "object": "n2",
                    "predicates": [
                        "biolink:affects"
                    ],
                    "provided_by": {
                        "allowlist": [
                            "infores:automat-cam-kp"
                        ]
                    }
                }
            },
            "nodes": {
                "n0": {
                    "ids": [
                        "MONDO:0009061"
                    ],
                    "is_set": false
                },
                "n1": {
                    "categories": [
                        "biolink:ChemicalEntity"
                    ],
                    "is_set": false
                },
                "n2": {
                    "categories": [
                        "biolink:GeneOrGeneProduct"
                    ],
                    "is_set": false
                }
            }
        }
    }
}

@maximusunc
Copy link

Your query doesn't have the extended timeout that I'm able to set directly in Aragorn. So WFR is returning nothing because icees-kg is timed out on the first hop. This is a performance issue, and I'm only able to get results back because I can peek behind the curtain and turn some hidden knobs.

@karafecho
Copy link
Author

Oh, I see. That makes sense.

In that case, perhaps you can send me the full response?

@karafecho
Copy link
Author

karafecho commented Nov 9, 2023

Just so everyone is clear, the goal of this effort is three-fold:

  1. Team science - more tightly couple icees-kg and cam-kp under Exposures Provider.
  2. Scientific impact - leverage cam-kp to provide mechanistic insights into clinical observations derived from icees-kg (AOPs of scientific interest).
  3. Translator MVP2 queries - contribute to Translator MVP2 queries by leveraging the CQS and targeting the clinical KPs in the first hop and cam-kp in the second hop.

@karafecho
Copy link
Author

Also see this GitHub folder and slide 9 in this slide deck.

@gaurav gaurav added this to the ICEES/CAM-KP integration pilot milestone Nov 11, 2023
@karafecho
Copy link
Author

Per decision on 01.03.2024: Max will rerun the above queries with extended timeouts in ARAGORN and cache the results. Kara will then test.

@karafecho
Copy link
Author

From Meisha, 01/17/2024:

thumbnail

Title: Peptide Oxidation Leading to Hypertension

Description from the wiki:

Here we present the supporting information on an AOP describing how vascular endothelial peptide oxidation leads to hypertension via perturbation of endothelial nitric oxide (NO) bioavailability. The molecular initiating event is oxidation of amino acid (AA) residues on critical peptides of the NO pathway, notably protein kinase B (AKT), guanosine triphosphate cyclohydrolase-1 (GTPCH-1), endothelial nitric oxide synthase (eNOS), and also the cellular ROS scavenger; glutathione. Oxidation of the enzymic components of the pathway lead to reduced expression of the phosphorylated proteins, and protein loss via proteasomal degradation. Oxidation of reduced glutathione to GSSG promotes bonding of GSSG to critical AA residues on eNOS, and the reduced expression of GTPCH-1 reduces bioavailability of tetrahydrobiopterin (BH4), both of which lead to uncoupling of eNOS (reduced NO production, increased superoxide production). The combination of these molecular events lead to reduced bioavailabilty of NO, which in turn reduces the potential for vasodilation and shifts the balance of vascular tone towards vasoconstriction. Repeated perturbation of this pathway via chronic exposure to toxicants, ultimately increases vascular resistance and contributes towards the development of hypertension.

@karafecho
Copy link
Author

karafecho commented Feb 5, 2024

gaurav added a commit that referenced this issue Feb 13, 2024
This PR adds a test suite in Python for CAM-KP-API to cam-pipeline. Some of this code has been moved over from https://github.com/ExposuresProvider/cam-kp-api and the rest has been newly written for this.

There are three tests here:
- test_api.py: test the Automat-CAM-KP API endpoints.
- test_examples.py: tests the example files in `examples/`.
- test_curies.py: test a set of CURIEs to see if Automat-CAM-KP has information or not about them. This currently includes failing tests from #101

I tried to move over the integration tests from CAM-KP-API, but I couldn't work out the easiest way to figure out how to move them over as a Scala project. I think these Python tests are easier to read and maintain, but I'm happy to be proved wrong.

Closes #94
@gaurav
Copy link
Member

gaurav commented Feb 13, 2024

I took another stab at the CURIEs I couldn't figure out previously, and found three more of them in CAM-KP. Most of these are NodeNorm issues in one way or another, but at least one of them could be fixed by turning on drug conflation when processing CAM-KP. I propose we use the alternate CURIEs I listed below while I try to figure out the NodeNorm issues.

CURIE Normalized to Should actually be normalized to How many unique CURIEs is this connected to in Automat-CAM-KP?
CHEMBL.COMPOUND:CHEMBL1256818 PUBCHEM.COMPOUND:5462351 ("Dextromethorphan hydrobromide monohydrate") PUBCHEM.COMPOUND:5360696 ("Dextromethorphan") None, but should exist (see Dextromethorphan on CTD)
PUBCHEM.COMPOUND:165363555 ("Trifacta") Normalized N/A None
HMDB:HMDB0252416 ("Fluticasone") PUBCHEM.COMPOUND:4659387 ("Fluticasona [Spanish]") PUBCHEM.COMPOUND:5311101 ("Fluticasone") 88
PUBCHEM.COMPOUND:123600 ("Levalbuterol") Normalized N/A None
HMDB:HMDB0242500 ("Budesonide") PUBCHEM.COMPOUND:5281004 N/A 167
CHEBI:5147 ("Formoterol") PUBCHEM.COMPOUND:3410 ("Formoterol") PUBCHEM.COMPOUND:45358055 ("Foradil Certihaler"), but cliques to 3410 with drug_conflation turned on 53

gaurav added a commit that referenced this issue Mar 15, 2024
…ated-pubchem-compound

Updated PubChem identifiers to the correct ones for the test as per #101 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants