Skip to content

Rexster Format

spmallette edited this page Sep 18, 2012 · 38 revisions

Rexster is a graph server that exposes any Blueprints graph (e.g. TinkerGraph, Neo4j, OrientDB, DEX, Titan, and Sail RDF Stores) through several mechanisms with a general focus on REST. (See the Benefits of Rexster).

Rexster and Faunus

Faunus can be configured to be used with Rexster (version 2.1.0+) through a Faunus Rexster Kibble (also known as an Extension).

The easiest way to get started is with one of the standard “toy” graphs that comes with Rexster: the Grateful Dead Graph. To deploy the Faunus Kibble, simply copy the Faunus jar file into the $rexster/ext directory (see Deploying an Extension). Next, edit the following segment of the rexster.xml file to tell Rexster to expose the Faunus Kibble on the gratefulgraph:

<graph>
    <graph-name>gratefulgraph</graph-name>
    <graph-type>com.tinkerpop.rexster.config.TinkerGraphGraphConfiguration</graph-type>
    <graph-location>data/graph-example-2</graph-location>
    <extensions>
        <allows>
            <allow>tp:gremlin</allow>
            <allow>faunus:inputformat<allow>
         </allows>
     </extensions>
 </graph>

Start Rexster (See Getting Started with Rexster) and note the inclusion of the Faunus Kibble in the console on gratefulgraph:

rexster$ bin/rexster.sh -s -c ./bin/rexster.xml
[INFO] WebServer - .:Welcome to Rexster:.
...
[INFO] RexsterApplicationGraph - Graph [gratefulgraph] - configured with allowable namespace [tp:gremlin]
[INFO] RexsterApplicationGraph - Graph [gratefulgraph] - configured with allowable namespace [faunus:inputformat]
...
[INFO] WebServer - Rexster Server running on: [http://localhost:8182]
[INFO] WebServer - Rexster configured with no security.
[INFO] WebServer - RexPro serving on port: [8184]
[INFO] ShutdownManager$ShutdownSocketListener - Bound shutdown socket to /127.0.0.1:8183. Starting listener thread for shutdown requests.

Next, a faunus-rexster.properties is created with the following properties. Note that bin/faunus-rexster.properties is provided with Faunus.

faunus.graph.input.format.class=com.thinkaurelius.faunus.formats.rexster.RexsterInputFormat
rexster.input.address=127.0.0.1
rexster.input.port=8182
rexster.input.ssl=false
rexster.input.graph=gratefulgraph
rexster.input.v.estimate=800

From here, any Faunus job can be run where now, the source data is being pulled from Rexster.

faunus$ gremlin.sh -i bin/faunus-rexster.properties 'g.V.in("followed_by").name.groupCount'
...
12/08/27 08:18:04 INFO mapreduce.FaunusRunner: Generating job chain: g.V().in('followed_by').property('name').groupCount()
12/08/27 08:18:04 INFO mapreduce.FaunusRunner: Compiled to 2 MapReduce job(s)
...
faunus$ hadoop fs -getmerge output/job-1 target/output
faunus$ bunzip2 target/output
faunus$	more target/output.out 
A MIND TO GIVE UP LIVIN 1
ADDAMS FAMILY   2
AINT SUPERSTITIOUS      9
ALABAMA GETAWAY 14
ALL ALONG THE WATCHTOWER        26
...
WOMEN ARE SMARTER       13
YOU AINT WOMAN ENOUGH   12
YOU WIN AGAIN   9
YOU WONT FIND ME        1
YOUR LOVE AT HOME       1

Configuration Cheatsheet

All settings are prefixed with rexster.input.

Setting Description
address The IP address or hostname of the Rexster server.
port The port that Rexster is serving the REST API from.
ssl Tells Faunus if it should connect to Rexster with http or https.
graph The name of the graph as configured in Rexster.
v.estimate The estimated number of vertices in the target graph. Helps Faunus understand how to split the job into even bits for better efficiency in processing.

Blueprints Graph Implementations

Rexster can expose any Blueprints graph to Faunus. It is important to note that Faunus is only capable of working with graph vertex and edge identifiers that are of the long data type. Using a Blueprints graph that does not have identifiers that resolve to long will produce errors.

OrientDB Configuration

OrientDB does not use long identifiers and instead has a compound identifier which consists of a cluster id and unique identifier for the item in the cluster. The Faunus Kibble is capable of converting this compound identifier to something Faunus can operate with. To tell the Faunus Kibble to convert the compound identifier to long, add the following configuration to rexster.xml for any OrientDB database:

<graph>
    <graph-name>orientdbsample</graph-name>
    <graph-type>orientgraph</graph-type>
    <graph-location>local:/tmp/orientdb</graph-location>
    <properties>
        <username>admin</username>
        <password>admin</password>
    </properties>
    <extensions>
        <allows>
            <allow>tp:gremlin</allow>
            <allow>faunus:inputformat<allow>
         </allows>
         <extension>
	     <namespace>faunus</namespace>
             <name>inputformat</name>
             <configuration>
                 <id-handler>orientdb</id-handler>
             </configuration>
         </extension>
     </extensions>
 </graph>

Usage in EC2

It’s not difficult to get Faunus running with Rexster in Amazon EC2. There are just a few EC2 and Rexster configuration steps to consider in addition to the instructions for Running Faunus on Amazon EC2.

After downloading Rexster to the EC2 instance, ensure that the base-uri and rexster-server-host> configuration properties of rexster.xml are set to the private IP address of the EC2 instance. The configuration should look something like the following:

<rexster>
  ...
  <rexster-server-host>10.118.95.50</rexster-server-host>
  <base-uri>http://10.118.95.50</base-uri>
  ...
</rexster>

By default, Faunus creates an EC2 security group called jclouds#faunuscluster, which all the Hadoop nodes are created in. To allow the nodes in this cluster to talk to the Rexster instance, the security group that Rexster is in must allow access to that security group.

To provide this access, first utilize Whirr to establish the Hadoop cluster (as described here) and find the jclouds#faunuscluster security group in the Amazon EC2 Console:

Take note of the security group identifier. In the case above, it is sg-02e38b6a. Edit the Inbound settings for the security group that Rexster is in. In the following screenshot, Rexster exists in a security group that is aptly named “Rexster Group”.

Add a rule that allows the jclouds#faunuscluster to access the Rexster Group over port 8182 (the default port established in rexster.xml). Utilize the security group identifier as shown in the above screenshot to create this rule. It is now possible to run a Faunus job against Rexster.

Clone this wiki locally