-
Notifications
You must be signed in to change notification settings - Fork 58
Rexster Format
-
InputFormat:
com.thinkaurelius.faunus.formats.rexster.RexsterInputFormat
Rexster is a graph server that exposes any Blueprints graph (e.g. TinkerGraph, Neo4j, OrientDB, DEX, Titan, and Sail RDF Stores) through several mechanisms with a general focus on REST. (See the Benefits of Rexster).
Faunus can be configured to be used with Rexster (version 2.1.0+, see the Rexster Operations section of Gotchas and Limitations) through a Faunus Rexster Kibble (also known as an Extension).
The easiest way to get started is with one of the standard “toy” graphs that comes with Rexster: the Grateful Dead Graph. To deploy the Faunus Kibble, simply copy the Faunus jar file into the $rexster/ext
directory (see Deploying an Extension). Next, edit the following segment of the rexster.xml
file to tell Rexster to expose the Faunus Kibble on the gratefulgraph
:
<graph>
<graph-name>gratefulgraph</graph-name>
<graph-type>com.tinkerpop.rexster.config.TinkerGraphGraphConfiguration</graph-type>
<graph-location>data/graph-example-2</graph-location>
<extensions>
<allows>
<allow>tp:gremlin</allow>
<allow>faunus:rexsterinputformat</allow>
</allows>
</extensions>
</graph>
Start Rexster (See Getting Started with Rexster) and note the inclusion of the Faunus Kibble in the console on gratefulgraph
:
rexster$ bin/rexster.sh -s -c ./bin/rexster.xml
[INFO] WebServer - .:Welcome to Rexster:.
...
[INFO] RexsterApplicationGraph - Graph [gratefulgraph] - configured with allowable namespace [tp:gremlin]
[INFO] RexsterApplicationGraph - Graph [gratefulgraph] - configured with allowable namespace [faunus:rexsterinputformat]
...
[INFO] WebServer - Rexster Server running on: [http://localhost:8182]
[INFO] WebServer - Rexster configured with no security.
[INFO] WebServer - RexPro serving on port: [8184]
[INFO] ShutdownManager$ShutdownSocketListener - Bound shutdown socket to /127.0.0.1:8183. Starting listener thread for shutdown requests.
Next, a rexster.properties
is created with the following properties. Note that bin/rexster.properties
is provided with Faunus.
faunus.graph.input.format=com.thinkaurelius.faunus.formats.rexster.RexsterInputFormat
rexster.input.address=127.0.0.1
rexster.input.port=8182
rexster.input.ssl=false
rexster.input.graph=gratefulgraph
rexster.input.v.estimate=800
From here, any Faunus job can be run where now, the source data is being pulled from Rexster. Start a Gremlin terminal from Faunus:
faunus$ gremlin.sh
Then do a simple label distribution:
gremlin> g = FaunusFactory.open('bin/rexster.properties')
==>faunusgraph[rexsterinputformat]
gremlin> g.V.in('followed_by').name.groupCount()
12/09/18 19:01:36 INFO mapreduce.FaunusCompiler: Compiled to 2 MapReduce job(s)
...
==>A MIND TO GIVE UP LIVIN 1
==>ADDAMS FAMILY 2
==>AINT SUPERSTITIOUS 9
==>ALABAMA GETAWAY 14
==>ALL ALONG THE WATCHTOWER 26
==>ALTHEA 58
==>ARE YOU LONELY FOR ME
==>...
All settings Rexster specific configuration settings in rexster.properties
are prefixed with rexster.input
.
Setting | Description |
---|---|
address |
The IP address or hostname of the Rexster server. |
port |
The port that Rexster is serving the REST API from. |
ssl |
Tells Faunus if it should connect to Rexster with http or https . |
graph |
The name of the graph as configured in Rexster. |
v.estimate |
The estimated number of vertices in the target graph. Helps Faunus understand how to split the job into even bits for better efficiency in processing. |
Rexster can expose any Blueprints graph to Faunus. It is important to note that Faunus is only capable of working with graph vertex and edge identifiers that are of the long
data type. Using a Blueprints graph that does not have identifiers that resolve to long
will produce errors.
OrientDB does not use long
identifiers and instead has a compound identifier which consists of a cluster id
and unique identifier for the item in the cluster. The Faunus Kibble is capable of converting this compound identifier to something Faunus can operate with. To tell the Faunus Kibble to convert the compound identifier to long
, add the following configuration to rexster.xml
for any OrientDB database:
<graph>
<graph-name>orientdbsample</graph-name>
<graph-type>orientgraph</graph-type>
<graph-location>local:/tmp/orientdb</graph-location>
<properties>
<username>admin</username>
<password>admin</password>
</properties>
<extensions>
<allows>
<allow>tp:gremlin</allow>
<allow>faunus:rexsterinputformat</allow>
</allows>
<extension>
<namespace>faunus</namespace>
<name>rexsterinputformat</name>
<configuration>
<id-handler>orientdb</id-handler>
</configuration>
</extension>
</extensions>
</graph>
The Titan BerkleyDB can be exposed to Faunus via Rexster. It does not expose long
identifiers for edges through its Blueprints interface. The Faunus Kibble is capable of converting the identifier it does use to something Faunus can operate with. To tell the Faunus Kibble to convert the identifier to long
, add the following configuration to rexster.xml
for any Titan BerkleyDB database:
<graph>
<graph-name>titanexample</graph-name>
<graph-type>com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration</graph-type>
<graph-location>/tmp/titan</graph-location>
<graph-read-only>false</graph-read-only>
<properties>
<storage.backend>local</storage.backend>
<buffer-size>100</buffer-size>
</properties>
<extensions>
<allows>
<allow>tp:gremlin</allow>
<allow>faunus:rexsterinputformat</allow>
</allows>
<extension>
<namespace>faunus</namespace>
<name>rexsterinputformat</name>
<configuration>
<id-handler>titan-berkley</id-handler>
</configuration>
</extension>
</extensions>
</graph>
It is important to note that this configuration is for BerkleyDB configuration only. For all other modes of Titan operations (i.e. Cassandra), use the appropriate Titan Formats.
It’s not difficult to get Faunus running with Rexster in Amazon EC2. There are just a few EC2 and Rexster configuration steps to consider in addition to the instructions for Running Faunus on Amazon EC2.
After downloading Rexster to the EC2 instance, ensure that the base-uri
and rexster-server-host>
configuration properties of rexster.xml
are set to the private IP address of the EC2 instance. The configuration should look something like the following:
<rexster>
...
<rexster-server-host>10.118.95.50</rexster-server-host>
<base-uri>http://10.118.95.50</base-uri>
...
</rexster>
By default, Faunus creates an EC2 security group called jclouds#faunuscluster
, which all the Hadoop nodes are created in. To allow the nodes in this cluster to talk to the Rexster instance, the security group that Rexster is in must allow access to that security group.
To provide this access, first utilize Whirr to establish the Hadoop cluster (as described here) and find the jclouds#faunuscluster
security group in the Amazon EC2 Console:
Take note of the security group identifier. In the case above, it is sg-02e38b6a
. Edit the Inbound settings for the security group that Rexster is in. In the following screenshot, Rexster exists in a security group that is aptly named “Rexster Group”.
Add a rule that allows the jclouds#faunuscluster
to access the Rexster Group over port 8182 (the default port established in rexster.xml
). Utilize the security group identifier as shown in the above screenshot to create this rule. It is now possible to run a Faunus job against Rexster.