Skip to content
This repository has been archived by the owner on May 23, 2024. It is now read-only.

Generate test data with qualifiers #640

Draft
wants to merge 80 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
b33af5a
Added an INNER_LIMIT to prevent queries from stalling.
gaurav Feb 7, 2023
7785a9f
Increased the inner limit based on the outer limit value.
gaurav Feb 7, 2023
9cf65d9
Fixed inner limit.
gaurav Feb 7, 2023
c872342
Reduced the inner limit multiplier to 1000.
gaurav Feb 7, 2023
045d509
Improved SPARQL code.
gaurav Feb 7, 2023
f04f9eb
Documented what this new directory is for.
gaurav Feb 28, 2023
70834c4
GenerateBiolinkPredicateMappings now runs a SPARQL query.
gaurav Feb 28, 2023
d0672ab
Added code for retrieving predicate mappings.
gaurav Feb 28, 2023
31586e7
Added transformation code.
gaurav Feb 28, 2023
afdb9bb
Fixed some syntax issues.
gaurav Feb 28, 2023
853b944
Added code to write out the PredicateJsonFilePath.
gaurav Feb 28, 2023
d5ae7c4
Added predicates.json to GitHub.
gaurav Feb 28, 2023
ea49bf7
Moved predicate mapping code into its own class.
gaurav Feb 28, 2023
aa420ea
First stab at a mapQueryEdgePredicates() method.
gaurav Feb 28, 2023
20385b7
First stab at complete transformation.
gaurav Feb 28, 2023
b685c29
Added reverse operation or something.
gaurav Feb 28, 2023
157a58d
Merge branch 'fix-query-stalling' into replace-predicate-mapping
gaurav Feb 28, 2023
d0428d2
Added hacky checks to make sure predicates.json doesn't exist.
gaurav Feb 28, 2023
b0385ea
Added Biolink3Test, changed some behavior, noted a needed fix.
gaurav Feb 28, 2023
f66ee8a
Attempt to improve the Biolink3Test.
gaurav Mar 7, 2023
9799fe0
Replaced Biolink predicates code to return all predicates.
gaurav Mar 7, 2023
13bbd54
Tests pass, woo.
gaurav Mar 7, 2023
c962c9e
Updated version for deployment.
gaurav Mar 14, 2023
05b97b3
Updated two tests to use Biolink3 syntax.
gaurav Mar 14, 2023
75a8f9e
Updated Biolink 3 example queries with results we can understand.
gaurav Mar 14, 2023
92dc566
Added PRCD-increases-chemical example.
gaurav Mar 14, 2023
6f4ecca
Fixed resource loading in predicate mappings.
gaurav Mar 14, 2023
b80ee85
Added timeout to LookupServiceTest.
gaurav Mar 15, 2023
d9141a8
Removed tests that are timing out (#633).
gaurav Mar 15, 2023
e9dd651
Reformatted files with scalafmtAll.
gaurav Mar 15, 2023
821c7bf
Updated LimitTest with updated counts.
gaurav Mar 17, 2023
0db643d
Clarified debugging message and marked it debug().
gaurav Mar 17, 2023
7e51c86
Fixed TRAPITest tests (by checking for unmapped predicates).
gaurav Mar 21, 2023
cd73b08
Added manual predicates to match the Biolink3 tests.
gaurav Mar 21, 2023
96d247e
Fixed Biolink3 tests.
gaurav Mar 21, 2023
08f4127
Manually added downregulates.
gaurav Mar 21, 2023
7f97dfa
Added a test for the KG-result edge name mismatch.
gaurav Mar 21, 2023
a3e33c6
Fixed KG-results mismatch issue.
gaurav Mar 21, 2023
c852b75
Various attempts at figuring out why the query_id isn't working.
gaurav Mar 22, 2023
81f9d2c
Incremented Biolink model version to v3.2.2.
gaurav Mar 28, 2023
e08403a
Improved documentation.
gaurav Mar 28, 2023
c48c4bd
Upgraded to Biolink v3.2.3.
gaurav Mar 28, 2023
d46edbc
Added a QualifiedBiolinkPredicate() type.
gaurav Mar 28, 2023
1fb9d44
First stab at including QualifiedBiolinkPredicates in Lookup.
gaurav Mar 28, 2023
42d8b45
Expanded use of QualifiedBiolinkPredicate.
gaurav Mar 28, 2023
d8fee0a
Updated tests to use QualifiedBiolinkPredicate.
gaurav Mar 28, 2023
7796c0f
Standardized to qualifiedBiolinkPredicate.
gaurav Mar 28, 2023
0796eba
Standardized to biolinkQualifiedPredicates.
gaurav Mar 28, 2023
b8cf4a2
Improved debugging.
gaurav Mar 28, 2023
85150b5
Incremented version number.
gaurav Mar 28, 2023
fab9797
First stab at a GenerateTestData utility.
gaurav Jan 24, 2023
9728cab
Making progress.
gaurav Jan 24, 2023
d0a67ef
Okay, so now we're querying test data.
gaurav Jan 24, 2023
8a78b1c
First stab at a TestEdge generator.
gaurav Jan 25, 2023
af92e8e
Fixed bug.
gaurav Jan 25, 2023
f35fd0a
Turned off sorting.
gaurav Jan 25, 2023
b865854
First stab at a test-edges file.
gaurav Jan 26, 2023
9ab7883
Updated test edges; trying to catch error didn't work.
gaurav Jan 26, 2023
abefd15
Completed running the test-edges.jsonl file.
gaurav Feb 7, 2023
7c2262f
Larger test edges output.
gaurav Feb 7, 2023
71b12a7
Regenerated test-edges.jsonl.
gaurav Feb 7, 2023
0290e1d
Reformatted code with scalafmtAll.
gaurav Feb 7, 2023
d14a060
Re-ran test edge generation.
gaurav Mar 28, 2023
f6ba607
First stab at an SRI testing file.
gaurav Mar 28, 2023
60e9421
Updated GenerateTestData to generate an SRI Testing file.
gaurav Mar 28, 2023
c800447
Uniquified the test data.
gaurav Mar 28, 2023
80d963f
Added test_data_location to Server metadata.
gaurav Mar 28, 2023
1cefc2c
First stab at writing out test data with qualifiers.
gaurav Apr 1, 2023
17a132d
Filtered out non-leaf Biolink classes.
gaurav Apr 25, 2023
767f0a5
Fixed code for generating test data.
gaurav May 2, 2023
8d09461
Reorganized and improved code.
gaurav May 2, 2023
89a3c37
Added code for translating relations into biolink predicates.
gaurav May 2, 2023
4acf229
Improved error output.
gaurav May 2, 2023
33ad8bc
Fixed bug and output.
gaurav May 2, 2023
042479c
Fixed bugs in logic and filtering.
gaurav May 2, 2023
6c24694
Uniquified test edges (up to a point).
gaurav May 2, 2023
bf59b37
Consistently use shorthand IRIs for everything.
gaurav May 2, 2023
c8fb59c
Added a check to prevent tests with the same subject and object.
gaurav May 2, 2023
f7b28d0
Took out the tip-only filter.
gaurav May 2, 2023
ee2eecc
Improved imports.
gaurav May 2, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ organization := "org.renci"

name := "cam-kp-api"

version := "0.3-pre1"
version := "0.3-pre2"

licenses := Seq("MIT license" -> url("https://opensource.org/licenses/MIT"))

Expand Down
47 changes: 47 additions & 0 deletions src/it/resources/examples/PRCD-increases-chemical.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
{
"description": "PRCD increases chemical",
"message": {
"query_graph": {
"nodes": {
"gene": {
"categories": [
"biolink:Gene"
],
"ids": [
"NCBIGene:768206"
]
},
"chemical": {
"categories": [
"biolink:ChemicalEntity"
]
}
},
"edges": {
"t_edge": {
"object": "gene",
"subject": "chemical",
"predicates": [
"biolink:affects"
],
"knowledge_type": "inferred",
"qualifier_constraints": [
{
"qualifier_set": [
{
"qualifier_type_id": "biolink:object_aspect_qualifier",
"qualifier_value": "activity_or_abundance"
},
{
"qualifier_type_id": "biolink:object_direction_qualifier",
"qualifier_value": "increased"
}
]
}
]
}
}
}
},
"minExpectedResults": 1000
}
12 changes: 11 additions & 1 deletion src/it/resources/examples/genes-upstream-of-GPR35.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,17 @@
"e0": {
"subject": "n0",
"object": "n1",
"predicates": ["biolink:affects_activity_of"]
"predicates": ["biolink:affects"],
"qualifier_constraints": [
{
"qualifier_set": [
{
"qualifier_type_id": "biolink:object_aspect_qualifier",
"qualifier_value": "activity"
}
]
}
]
},
"e1": {
"subject": "n1",
Expand Down
12 changes: 11 additions & 1 deletion src/it/resources/examples/swagger-example.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,17 @@
"edges": {
"e0": {
"predicates": [
"biolink:positively_regulates"
"biolink:regulates"
],
"qualifier_constraints": [
{
"qualifier_set": [
{
"qualifier_type_id": "biolink:object_direction_qualifier",
"qualifier_value": "upregulated"
}
]
}
],
"subject": "n0",
"object": "n1"
Expand Down
3 changes: 2 additions & 1 deletion src/main/resources/application.conf
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
{
version = "0.3-pre1"
version = "0.3-pre3"
host = 0.0.0.0
port = 8080
port = ${?PORT}
trapi-version = "1.3.0"
trapi-version = ${?TRAPI_VERSION}
biolink-version = "v3.2.3"
location = "http://localhost:8080"
location = ${?LOCATION}
sparql-endpoint = "https://cam-kp-sparql.apps.renci.org/sparql"
Expand Down
38 changes: 38 additions & 0 deletions src/main/resources/biolink/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Biolink mappings

CAM-KP needs Biolink information in both the triplestore backend as well as the
frontend. This is frustrating -- if only one of them needed to know about the
Biolink model, that would greatly simplify what we need to do here.

We previously generated `predicates.csv` and `mkg-nodes.csv` from the
[CAM Pipeline](https://github.com/ExposuresProvider/cam-pipeline) and incorporated
them in here (via the parent directory). This was relatively easy to do when
Biolink predicates could be mapped directly to relations in the triplestore, but
since Biolink 3 includes predicates modified with qualifiers, this mapping is more
complicated.

This directory takes all of our predicate mapping logic and puts it into one place:
the [`predicates.json`](./predicates.json) file in this directory is intended to provide
the definitive set of mappings between Biolink predicate/qualifier combinations and
triplestore relations. A single relation may be mapped to several predicate/qualifier
combinations and vice versa.

Two pieces of code interact with this file:
1. `org.renci.cam.util.GenerateBiolinkPredicateMappings` is a standalone program that
can be used to regenerate this file. It currently uses the
[Biolink predicate_mapping.yaml](https://github.com/biolink/biolink-model/blob/master/predicate_mapping.yaml),
mapping information from the triplestore and a list of manual mappings added during
development, but in the future it may be expanded to include additional files.
2. `org.renci.cam.domain.PredicateMappings` is a module in the code that provides a
programmatic interface to the contents of the `predicates.json` file, and provides
methods to map from Biolink predicate/qualifier combinations to relations.

The `predicates.json` file can be regenerated by updating the
Biolink version in application.conf and then running:

```shell
$ SPARQL_ENDPOINT=https://cam-kp-sparql-dev.apps.renci.org/sparql sbt "runMain org.renci.cam.util.GenerateBiolinkPredicateMappings"
```

The latest `predicates.json` file should be checked into the GitHub repository so that
changes to it can be tracked.
Loading