Skip to content
This repository has been archived by the owner on May 23, 2024. It is now read-only.

Replace predicate mapping with a local JSON file #635

Draft
wants to merge 59 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
b33af5a
Added an INNER_LIMIT to prevent queries from stalling.
gaurav Feb 7, 2023
7785a9f
Increased the inner limit based on the outer limit value.
gaurav Feb 7, 2023
9cf65d9
Fixed inner limit.
gaurav Feb 7, 2023
c872342
Reduced the inner limit multiplier to 1000.
gaurav Feb 7, 2023
045d509
Improved SPARQL code.
gaurav Feb 7, 2023
f04f9eb
Documented what this new directory is for.
gaurav Feb 28, 2023
70834c4
GenerateBiolinkPredicateMappings now runs a SPARQL query.
gaurav Feb 28, 2023
d0672ab
Added code for retrieving predicate mappings.
gaurav Feb 28, 2023
31586e7
Added transformation code.
gaurav Feb 28, 2023
afdb9bb
Fixed some syntax issues.
gaurav Feb 28, 2023
853b944
Added code to write out the PredicateJsonFilePath.
gaurav Feb 28, 2023
d5ae7c4
Added predicates.json to GitHub.
gaurav Feb 28, 2023
ea49bf7
Moved predicate mapping code into its own class.
gaurav Feb 28, 2023
aa420ea
First stab at a mapQueryEdgePredicates() method.
gaurav Feb 28, 2023
20385b7
First stab at complete transformation.
gaurav Feb 28, 2023
b685c29
Added reverse operation or something.
gaurav Feb 28, 2023
157a58d
Merge branch 'fix-query-stalling' into replace-predicate-mapping
gaurav Feb 28, 2023
d0428d2
Added hacky checks to make sure predicates.json doesn't exist.
gaurav Feb 28, 2023
b0385ea
Added Biolink3Test, changed some behavior, noted a needed fix.
gaurav Feb 28, 2023
f66ee8a
Attempt to improve the Biolink3Test.
gaurav Mar 7, 2023
9799fe0
Replaced Biolink predicates code to return all predicates.
gaurav Mar 7, 2023
13bbd54
Tests pass, woo.
gaurav Mar 7, 2023
c962c9e
Updated version for deployment.
gaurav Mar 14, 2023
05b97b3
Updated two tests to use Biolink3 syntax.
gaurav Mar 14, 2023
75a8f9e
Updated Biolink 3 example queries with results we can understand.
gaurav Mar 14, 2023
92dc566
Added PRCD-increases-chemical example.
gaurav Mar 14, 2023
6f4ecca
Fixed resource loading in predicate mappings.
gaurav Mar 14, 2023
b80ee85
Added timeout to LookupServiceTest.
gaurav Mar 15, 2023
d9141a8
Removed tests that are timing out (#633).
gaurav Mar 15, 2023
e9dd651
Reformatted files with scalafmtAll.
gaurav Mar 15, 2023
821c7bf
Updated LimitTest with updated counts.
gaurav Mar 17, 2023
0db643d
Clarified debugging message and marked it debug().
gaurav Mar 17, 2023
7e51c86
Fixed TRAPITest tests (by checking for unmapped predicates).
gaurav Mar 21, 2023
cd73b08
Added manual predicates to match the Biolink3 tests.
gaurav Mar 21, 2023
96d247e
Fixed Biolink3 tests.
gaurav Mar 21, 2023
08f4127
Manually added downregulates.
gaurav Mar 21, 2023
7f97dfa
Added a test for the KG-result edge name mismatch.
gaurav Mar 21, 2023
a3e33c6
Fixed KG-results mismatch issue.
gaurav Mar 21, 2023
c852b75
Various attempts at figuring out why the query_id isn't working.
gaurav Mar 22, 2023
81f9d2c
Incremented Biolink model version to v3.2.2.
gaurav Mar 28, 2023
e08403a
Improved documentation.
gaurav Mar 28, 2023
c48c4bd
Upgraded to Biolink v3.2.3.
gaurav Mar 28, 2023
d46edbc
Added a QualifiedBiolinkPredicate() type.
gaurav Mar 28, 2023
1fb9d44
First stab at including QualifiedBiolinkPredicates in Lookup.
gaurav Mar 28, 2023
42d8b45
Expanded use of QualifiedBiolinkPredicate.
gaurav Mar 28, 2023
d8fee0a
Updated tests to use QualifiedBiolinkPredicate.
gaurav Mar 28, 2023
7796c0f
Standardized to qualifiedBiolinkPredicate.
gaurav Mar 28, 2023
0796eba
Standardized to biolinkQualifiedPredicates.
gaurav Mar 28, 2023
b8cf4a2
Improved debugging.
gaurav Mar 28, 2023
85150b5
Incremented version number.
gaurav Mar 28, 2023
6cd40a0
Incremented version.
gaurav Mar 28, 2023
71d0db6
Incremented version number in build.sbt.
gaurav Mar 28, 2023
2fedd6b
Fixed expectations for simple.json.
gaurav Mar 28, 2023
2df9dd3
Added two Biolink 3 tests to IT.
gaurav Mar 28, 2023
fc3ada1
Fixed test in ImplicitsTest.
gaurav Mar 28, 2023
4e9c6b3
Fixed one of the QueryServiceTests by changing the target.
gaurav Apr 1, 2023
5331039
Fixed identifier for glucose.
gaurav Apr 1, 2023
a2fd714
Made debugging log explicit.
gaurav Apr 1, 2023
a9dd2e8
Removed unneeded TODO.
gaurav Apr 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ organization := "org.renci"

name := "cam-kp-api"

version := "0.3-pre1"
version := "0.3-pre4"

licenses := Seq("MIT license" -> url("https://opensource.org/licenses/MIT"))

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
{
"description": "Example creative mode query (https://github.com/NCATSTranslator/TranslatorArchitecture/issues/80) using NCBIGene:598 (BCL2L1)",
"message": {
"query_graph": {
"nodes": {
"gene": {
"categories": [
"biolink:Gene"
],
"ids": [
"NCBIGene:598"
]
},
"chemical": {
"categories": [
"biolink:ChemicalEntity"
]
}
},
"edges": {
"t_edge": {
"object": "gene",
"subject": "chemical",
"predicates": [
"biolink:affects"
],
"knowledge_type": "inferred",
"qualifier_constraints": [
{
"qualifier_set": [
{
"qualifier_type_id": "biolink:object_aspect_qualifier",
"qualifier_value": "activity_or_abundance"
},
{
"qualifier_type_id": "biolink:object_direction_qualifier",
"qualifier_value": "decreased"
}
]
}
]
}
}
}
},
"minExpectedResults": 6
}
47 changes: 47 additions & 0 deletions src/it/resources/examples/PRCD-increases-chemical.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
{
"description": "PRCD increases chemical",
"message": {
"query_graph": {
"nodes": {
"gene": {
"categories": [
"biolink:Gene"
],
"ids": [
"NCBIGene:768206"
]
},
"chemical": {
"categories": [
"biolink:ChemicalEntity"
]
}
},
"edges": {
"t_edge": {
"object": "gene",
"subject": "chemical",
"predicates": [
"biolink:affects"
],
"knowledge_type": "inferred",
"qualifier_constraints": [
{
"qualifier_set": [
{
"qualifier_type_id": "biolink:object_aspect_qualifier",
"qualifier_value": "activity_or_abundance"
},
{
"qualifier_type_id": "biolink:object_direction_qualifier",
"qualifier_value": "increased"
}
]
}
]
}
}
}
},
"minExpectedResults": 1000
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
{
"description": "Example creative mode query (https://github.com/NCATSTranslator/TranslatorArchitecture/issues/79) using NCBIGene:340061 (STING1)",
"message": {
"query_graph": {
"nodes": {
"gene": {
"categories": [
"biolink:Gene"
],
"ids": [
"NCBIGene:340061"
]
},
"chemical": {
"categories": [
"biolink:ChemicalEntity"
]
}
},
"edges": {
"t_edge": {
"object": "gene",
"subject": "chemical",
"predicates": [
"biolink:affects"
],
"knowledge_type": "inferred",
"qualifier_constraints": [
{
"qualifier_set": [
{
"qualifier_type_id": "biolink:object_aspect_qualifier",
"qualifier_value": "activity_or_abundance"
},
{
"qualifier_type_id": "biolink:object_direction_qualifier",
"qualifier_value": "increased"
}
]
}
]
}
}
}
}
"minExpectedResults": 6
}
12 changes: 11 additions & 1 deletion src/it/resources/examples/genes-upstream-of-GPR35.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,17 @@
"e0": {
"subject": "n0",
"object": "n1",
"predicates": ["biolink:affects_activity_of"]
"predicates": ["biolink:affects"],
"qualifier_constraints": [
{
"qualifier_set": [
{
"qualifier_type_id": "biolink:object_aspect_qualifier",
"qualifier_value": "activity"
}
]
}
]
},
"e1": {
"subject": "n1",
Expand Down
2 changes: 1 addition & 1 deletion src/it/resources/examples/simple.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,5 @@
}
},
"minExpectedResults": 160,
"maxExpectedResults": 160
"maxExpectedResults": 1000
}
12 changes: 11 additions & 1 deletion src/it/resources/examples/swagger-example.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,17 @@
"edges": {
"e0": {
"predicates": [
"biolink:positively_regulates"
"biolink:regulates"
],
"qualifier_constraints": [
{
"qualifier_set": [
{
"qualifier_type_id": "biolink:object_direction_qualifier",
"qualifier_value": "upregulated"
}
]
}
],
"subject": "n0",
"object": "n1"
Expand Down
28 changes: 18 additions & 10 deletions src/it/scala/org/renci/cam/it/ImplicitsTest.scala
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ import com.typesafe.scalalogging.LazyLogging
import io.circe.generic.auto._
import io.circe.parser._
import io.circe.syntax._
import io.circe.{Decoder, Encoder, KeyDecoder, KeyEncoder}
import io.circe.{KeyDecoder, KeyEncoder}
import org.renci.cam.Biolink.BiolinkData
import org.renci.cam.domain._
import org.renci.cam.{AppConfig, Biolink, HttpClient, Implicits}
Expand Down Expand Up @@ -39,15 +39,23 @@ object ImplicitsTest extends DefaultRunnableSpec with LazyLogging {
testM("test Implicits.predicateOrPredicateListDecoder") {
for {
biolinkData <- Biolink.biolinkData
} yield {
val dataAsList = """["biolink:participates_in","biolink:related_to"]"""
val data = """"biolink:related_to""""
import biolinkData.implicits._
val ret = decode[List[BiolinkPredicate]](data)
val retWithListData = decode[List[BiolinkPredicate]](dataAsList)
assert(ret.toOption.get)(contains(BiolinkPredicate("related_to"))) && assert(retWithListData.toOption.get)(
contains(BiolinkPredicate("related_to")))
}
data = """"biolink:related_to""""
dataAsJson = {
import biolinkData.implicits._
decode[BiolinkPredicate](data)
}

dataAsList = """["biolink:participates_in","biolink:related_to"]"""
dataAsListAsJson = {
import biolinkData.implicits._
decode[List[BiolinkPredicate]](dataAsList)
}
} yield assert(dataAsJson)(Assertion.isRight(Assertion.equalTo(BiolinkPredicate("related_to")))) &&
assert(dataAsListAsJson)(
Assertion.isRight(
Assertion.contains(BiolinkPredicate("participates_in")) &&
Assertion.contains(BiolinkPredicate("related_to"))
))
}
)

Expand Down
3 changes: 2 additions & 1 deletion src/main/resources/application.conf
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
{
version = "0.3-pre1"
version = "0.3-pre4"
host = 0.0.0.0
port = 8080
port = ${?PORT}
trapi-version = "1.3.0"
trapi-version = ${?TRAPI_VERSION}
biolink-version = "v3.2.3"
location = "http://localhost:8080"
location = ${?LOCATION}
sparql-endpoint = "https://cam-kp-sparql.apps.renci.org/sparql"
Expand Down
38 changes: 38 additions & 0 deletions src/main/resources/biolink/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Biolink mappings

CAM-KP needs Biolink information in both the triplestore backend as well as the
frontend. This is frustrating -- if only one of them needed to know about the
Biolink model, that would greatly simplify what we need to do here.

We previously generated `predicates.csv` and `mkg-nodes.csv` from the
[CAM Pipeline](https://github.com/ExposuresProvider/cam-pipeline) and incorporated
them in here (via the parent directory). This was relatively easy to do when
Biolink predicates could be mapped directly to relations in the triplestore, but
since Biolink 3 includes predicates modified with qualifiers, this mapping is more
complicated.

This directory takes all of our predicate mapping logic and puts it into one place:
the [`predicates.json`](./predicates.json) file in this directory is intended to provide
the definitive set of mappings between Biolink predicate/qualifier combinations and
triplestore relations. A single relation may be mapped to several predicate/qualifier
combinations and vice versa.

Two pieces of code interact with this file:
1. `org.renci.cam.util.GenerateBiolinkPredicateMappings` is a standalone program that
can be used to regenerate this file. It currently uses the
[Biolink predicate_mapping.yaml](https://github.com/biolink/biolink-model/blob/master/predicate_mapping.yaml),
mapping information from the triplestore and a list of manual mappings added during
development, but in the future it may be expanded to include additional files.
2. `org.renci.cam.domain.PredicateMappings` is a module in the code that provides a
programmatic interface to the contents of the `predicates.json` file, and provides
methods to map from Biolink predicate/qualifier combinations to relations.

The `predicates.json` file can be regenerated by updating the
Biolink version in application.conf and then running:

```shell
$ SPARQL_ENDPOINT=https://cam-kp-sparql-dev.apps.renci.org/sparql sbt "runMain org.renci.cam.util.GenerateBiolinkPredicateMappings"
```

The latest `predicates.json` file should be checked into the GitHub repository so that
changes to it can be tracked.
Loading