Skip to content

Commit

Permalink
CMR-7312: As a CMR client user, I would like to see the API doc on gr…
Browse files Browse the repository at this point in the history
…aph db related endpoints and features.
  • Loading branch information
ygliuvt committed May 28, 2021
1 parent 269bc78 commit a80e152
Showing 1 changed file with 60 additions and 0 deletions.
60 changes: 60 additions & 0 deletions graph-db/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,63 @@ There are two serverless applications that interact with the graph database:
* Indexer

Indexer is a serverless application that is connected to a SQS queue that is associated with the live CMR collection ingest/update events. It will index new CMR collection ingest/update into the graph database.

## Explore Indexed Data
CMR graph database is a Neptune database hosted on AWS. Currently, we only index collections and their documentation related urls as vertices in the graph database with edges (named `documents`) from the related url vertices to the collection vertices that reference them.

The collection vertex has the following properties:

* concept-id - Collection concept id
* name - Url to the collection landing page
* title - Entry title of the collection
* doi - DOI link of the collection

The documentation vertex has the following properties:

* name - documentation url
* title - description of the documentation url

### Access via CMR graphdb endpoint

CMR graphdb access endpoint is at: https://cmr.sit.earthdata.nasa.gov/graphdb. Users can use the [Gremlin API](https://tinkerpop.apache.org/gremlin.html) to explore the relationships that are indexed in the graph database. Here are some examples:

To see the total number of vertices in the graph db:
```
curl -XPOST https://cmr.sit.earthdata.nasa.gov/graphdb -d '{"gremlin":"g.V().count()"}'
```

To see the content of the first 10 vertices in the graph db:
```
curl -XPOST https://cmr.sit.earthdata.nasa.gov/graphdb -d '{"gremlin":"g.V().limit(10)"}'
```

To see all collections that share the same documentation URL with the collection (C1233352242-GHRC):
```
curl -XPOST https://cmr.sit.earthdata.nasa.gov/graphdb -d '{"gremlin":"g.V().hasLabel(\"dataset\").has(\"concept-id\", \"C1233352242-GHRC\").inE(\"documents\").outV().hasLabel(\"documentation\").outE(\"documents\").inV().hasLabel(\"dataset\").valueMap()"}'
```

For users have write access to graphdb, they can also add vertices and edges between vertices. For example:

To create a collection vertex:
```
curl -XPOST https://cmr.sit.earthdata.nasa.gov/graphdb -d '{"gremlin":"g.addV(\"dataset\").property(\"name\", \"https://dx.doi.org/undefined\").property(\"title\", \"GPM Ground Validation Precipitation Imaging Package (PIP) ICE POP V1\").property(\"concept-id\", \"C1233352242-GHRC\").property(\"doi\", \"10.5067/GPMGV/ICEPOP/PIP/DATA101\")"}'
```

To create a documentation vertex:
```
curl -XPOST https://cmr.sit.earthdata.nasa.gov/graphdb -d '{"gremlin":"g.addV(\"documentation\").property(\"name\", \"https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20180003615.pdf\").property(\"title\", \"NASA Participation in the International Collaborative Experiments for Pyeongchang 2018 Olympic and Paralympic Winter Games (ICE-POP 2018)\")"}'
```

To create an edge from the above documentation vertex to the collection vertex:
```
curl -XPOST https://cmr.sit.earthdata.nasa.gov/graphdb -d '{"gremlin":"g.V().hasLabel(\"documentation\").has(\"name\", \"https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20180003615.pdf\").addE(\"documents\").to(g.V().hasLabel(\"dataset\").has(\"concept-id\", \"C1233352242-GHRC\"))"}'
```

### Access via SSH tunnel and Gremlin Console locally
For users who have access to AWS Neptune endpoint via an internal jumpbox, they can set up SSH tunnel to the Neptune endpoint and start Gremlin Console locally to connect to the Neptune endpoint. Then, they can use Gremlin console to explore the graph database as if it is local.

Prerequisites: User must have ssh access to the internal jumpbox that has access to Neptune endpoint.

See [this AWS document](https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-console.html) on how to set up the Gremlin Console to connect to a Neptune DB instance.

Happy exploring!

0 comments on commit a80e152

Please sign in to comment.