Skip to content

Rexster Format

spmallette edited this page Sep 27, 2012 · 38 revisions

  • InputFormat: com.thinkaurelius.faunus.formats.rexster.RexsterInputFormat

Rexster is a graph server that exposes any Blueprints graph (e.g. TinkerGraph, Neo4j, OrientDB, DEX, Titan, and Sail RDF Stores) through several mechanisms with a general focus on REST. (See the Benefits of Rexster).

Rexster and Faunus

Faunus can be configured to be used with Rexster (version 2.1.0+, see the Rexster Operations section of Gotchas and Limitations) through a Faunus Rexster Kibble (also known as an Extension).

The easiest way to get started is with one of the standard “toy” graphs that comes with Rexster: the Grateful Dead Graph. To deploy the Faunus Kibble, simply copy the Faunus jar file into the $rexster/ext directory (see Deploying an Extension). Next, edit the following segment of the rexster.xml file to tell Rexster to expose the Faunus Kibble on the gratefulgraph:

<graph>
    <graph-name>gratefulgraph</graph-name>
    <graph-type>com.tinkerpop.rexster.config.TinkerGraphGraphConfiguration</graph-type>
    <graph-location>data/graph-example-2</graph-location>
    <extensions>
        <allows>
            <allow>tp:gremlin</allow>
            <allow>faunus:rexsterinputformat</allow>
         </allows>
     </extensions>
 </graph>

Start Rexster (See Getting Started with Rexster) and note the inclusion of the Faunus Kibble in the console on gratefulgraph:

rexster$ bin/rexster.sh -s -c ./bin/rexster.xml
[INFO] WebServer - .:Welcome to Rexster:.
...
[INFO] RexsterApplicationGraph - Graph [gratefulgraph] - configured with allowable namespace [tp:gremlin]
[INFO] RexsterApplicationGraph - Graph [gratefulgraph] - configured with allowable namespace [faunus:rexsterinputformat]
...
[INFO] WebServer - Rexster Server running on: [http://localhost:8182]
[INFO] WebServer - Rexster configured with no security.
[INFO] WebServer - RexPro serving on port: [8184]
[INFO] ShutdownManager$ShutdownSocketListener - Bound shutdown socket to /127.0.0.1:8183. Starting listener thread for shutdown requests.

Next, a rexster.properties is created with the following properties. Note that bin/rexster.properties is provided with Faunus.

faunus.graph.input.format=com.thinkaurelius.faunus.formats.rexster.RexsterInputFormat
rexster.input.address=127.0.0.1
rexster.input.port=8182
rexster.input.ssl=false
rexster.input.graph=gratefulgraph
rexster.input.v.estimate=800

From here, any Faunus job can be run where now, the source data is being pulled from Rexster. Start a Gremlin terminal from Faunus:

faunus$ gremlin.sh

Then do a simple label distribution:

gremlin> g = FaunusFactory.open('bin/rexster.properties')
==>faunusgraph[rexsterinputformat]
gremlin> g.V.in('followed_by').name.groupCount()
12/09/18 19:01:36 INFO mapreduce.FaunusCompiler: Compiled to 2 MapReduce job(s)
...
==>A MIND TO GIVE UP LIVIN	1
==>ADDAMS FAMILY	2
==>AINT SUPERSTITIOUS	9
==>ALABAMA GETAWAY	14
==>ALL ALONG THE WATCHTOWER	26
==>ALTHEA	58
==>ARE YOU LONELY FOR ME	
==>...

Configuration Cheatsheet

All settings Rexster specific configuration settings in rexster.properties are prefixed with rexster.input.

Setting Description
address The IP address or hostname of the Rexster server.
port The port that Rexster is serving the REST API from.
ssl Tells Faunus if it should connect to Rexster with http or https.
graph The name of the graph as configured in Rexster.
v.estimate The estimated number of vertices in the target graph. Helps Faunus understand how to split the job into even bits for better efficiency in processing.

Blueprints Graph Implementations

Rexster can expose any Blueprints graph to Faunus. It is important to note that Faunus is only capable of working with graph vertex and edge identifiers that are of the long data type. Using a Blueprints graph that does not have identifiers that resolve to long will produce errors.

OrientDB Configuration

OrientDB does not use long identifiers and instead has a compound identifier which consists of a cluster id and unique identifier for the item in the cluster. The Faunus Kibble is capable of converting this compound identifier to something Faunus can operate with. To tell the Faunus Kibble to convert the compound identifier to long, add the following configuration to rexster.xml for any OrientDB database:

<graph>
    <graph-name>orientdbsample</graph-name>
    <graph-type>orientgraph</graph-type>
    <graph-location>local:/tmp/orientdb</graph-location>
    <properties>
        <username>admin</username>
        <password>admin</password>
    </properties>
    <extensions>
        <allows>
            <allow>tp:gremlin</allow>
            <allow>faunus:rexsterinputformat</allow>
         </allows>
         <extension>
	     <namespace>faunus</namespace>
             <name>rexsterinputformat</name>
             <configuration>
                 <id-handler>orientdb</id-handler>
             </configuration>
         </extension>
     </extensions>
 </graph>

Titan BerkleyDB Configuration

The Titan BerkleyDB can be exposed to Faunus via Rexster. It does not expose long identifiers for edges through its Blueprints interface. The Faunus Kibble is capable of converting the identifier it does use to something Faunus can operate with. To tell the Faunus Kibble to convert the identifier to long, add the following configuration to rexster.xml for any Titan BerkleyDB database:

<graph>
  <graph-name>titanexample</graph-name>
  <graph-type>com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration</graph-type>
  <graph-location>/tmp/titan</graph-location>
  <graph-read-only>false</graph-read-only>
  <properties>
    <storage.backend>local</storage.backend>
    <buffer-size>100</buffer-size>
  </properties>
  <extensions>
    <allows>
      <allow>tp:gremlin</allow>
      <allow>faunus:rexsterinputformat</allow>
    </allows>
    <extension>
      <namespace>faunus</namespace>
      <name>rexsterinputformat</name>
      <configuration>
        <id-handler>titan-berkley</id-handler>
      </configuration>
    </extension>
  </extensions>
</graph>

It is important to note that this configuration is for BerkleyDB configuration only. For all other modes of Titan operations (i.e. Cassandra), use the appropriate Titan Formats.

Usage in EC2

It’s not difficult to get Faunus running with Rexster in Amazon EC2. There are just a few EC2 and Rexster configuration steps to consider in addition to the instructions for Running Faunus on Amazon EC2.

After downloading Rexster to the EC2 instance, ensure that the base-uri and rexster-server-host> configuration properties of rexster.xml are set to the private IP address of the EC2 instance. The configuration should look something like the following:

<rexster>
  ...
  <rexster-server-host>10.118.95.50</rexster-server-host>
  <base-uri>http://10.118.95.50</base-uri>
  ...
</rexster>

By default, Faunus creates an EC2 security group called jclouds#faunuscluster, which all the Hadoop nodes are created in. To allow the nodes in this cluster to talk to the Rexster instance, the security group that Rexster is in must allow access to that security group.

To provide this access, first utilize Whirr to establish the Hadoop cluster (as described here) and find the jclouds#faunuscluster security group in the Amazon EC2 Console:

Take note of the security group identifier. In the case above, it is sg-02e38b6a. Edit the Inbound settings for the security group that Rexster is in. In the following screenshot, Rexster exists in a security group that is aptly named “Rexster Group”.

Add a rule that allows the jclouds#faunuscluster to access the Rexster Group over port 8182 (the default port established in rexster.xml). Utilize the security group identifier as shown in the above screenshot to create this rule. It is now possible to run a Faunus job against Rexster.

Clone this wiki locally