Skip to content

Poseidon core - storage and transaction management for property graphs

License

Notifications You must be signed in to change notification settings

dbis-ilm/poseidon_core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Poseidon Graph Database

Poseidon is a graph database system originally designed for persistent memory, but supports disk-based data, too. The data model used in Poseidon is the property graph model where nodes and relationships have labels (type names) and properties (key-value pairs). This module poseidon_core provides the implementation for the storage and transaction manager as well as the query engine consisting of a parser for a simple algebraic query language, a query interpreter, and a LLVM-based query compiler.

Installation

Simply, clone the repository, create a build directory and run the build tools cmake and make:

mkdir build; cd build
cmake ..
make

Docker

Alternatively, you can build a docker container with all the necessary tools and a compiled version of Poseidon. Simply, use the provided Dockerfile:

docker build . -t poseidon-dev
docker run -it poseidon-dev

Using Poseidon

Poseidon provides a command line interface called pcli for managing databases and executing queries. The following options are supported:

Option Description
-v [ --verbose ] verbose - show debug output
-d [ --db ] arg specifies the database name (required)
-p [ --pool ] arg the path to the PMem pool (for PMDK) or the directory containing all databases
-o [ --output ] arg dumps the graph to the given file (in DOT format)
--strict strict mode - assumes that all columns contain values of the same type
--delimiter arg specifies the delimiter character
-f [ --format ] arg specifies CSV format: n4j
--import-path arg specifies the directory containing import files
--import arg imports the files in CSV format either nodes:<node-type>:<filename> or relationships:<rship-type>:<filename>
-q [ --query ] arg executes the query from the given file
-s [ --shell ] starts the interactive shell
--qmode arg specifies the query compile mode: interp (default), llvm, or adapt

In the following example, we create a new database and load the graph data from CSV files in Neo4j file format. Note, that all files representing a database have to imported with a single invocation of pcli as in the following example:

./build/pcli --pool demo --db testdb  -f n4j --delimiter , \
     --import nodes:Movie:./test/movies.csv \
     --import nodes:Actor:./test/actors.csv \
     --import relationships:./test/roles.csv 

Alternatively, CVS files in a plain CSV format can be also imported.

After creating the database, we can execute queries, e.g.

./build/pcli --pool demo --db testdb
poseidon> NodeScan()
poseidon> NodeScan('Movie')
poseidon> Filter($0.title == 'Inception (2010)', NodeScan('Movie'))
poseidon> Expand(IN, 'Actor', ForeachRelationship(TO, 'PLAYED_IN', NodeScan('Movie')))

Poseidon supports a simple algebraic query notation in the following form of a query expression:

query-op(args, query-expr)

A query is evaluated from the inside out: query-expr is first evaluated and produces the input for query-op. Each operator in a query expression adds its result (node, relationship, value(s)) to the result of its input expression. For example, an expression such as

ForeachRelationship(FROM, 'isLocatedIn', 
   NodeScan('Person'))

produces a result with two columns: column #0 contains the Person node, column #1, the outgoing relationships of type isLocatedIn. In expressions such as Filter or Project one can refer to these columns by $0, $1 etc., e.g.:

Project([$0.firstName:string, $0.lastName:string, $2.id:uint64, $2.name:string],
    Match((p1:Person {id: 933})-[:isLocatedIn]->(p2:Place))

The following query operators are supported:

Operator Parameter Example Description
NodeScan node type (optional) NodeScan()
NodeScan('Person')
Scans the node table and returns all nodes of the optionally given type.
IndexScan node type, property, key IndexScan('Person', 'id', 933) Performs an index lookup and returns all nodes of the given type that satisfy the predicate condition
Filter filter expression, input expression Filter($0.id == 42, NodeScan('Person')) Processes the input list of nodes and rships produced by input expression query-expr and returns all tuples satisfying the given condition. In the expressions, the input columns are denoted by $0, $1, $2 etc.
ForeachRelationship TO or FROM or ALL, RelationshipType, input ForeachRelationship(FROM, 'isLocatedIn', NodeScan('Person')) Traverses all incoming or outgoing or both relationships of the given type
Expand IN or OUT, node type, input expression Expand(OUT, 'Place', ForeachRelationship(FROM, 'isLocatedIn', NodeScan('Person'))) Gets all the source or destination nodes of the given type. Used after ForeachRelationship operator.
Match path pattern, input expression Match((p1:Person {id: 933})-[:isLocatedIn]->(p2:Place)) Evaluates the given path pattern (in Cypher's ASCII art notation) and returns the matching nodes/relationships. In fact, this operator is internally translated to a sequence of corresponding ForeachRelationship-Expand expressions.
Project [ projection list ], input expr Project([$0.firstName:string, $0.lastName:string, $0.id:uint64], NodeScan('Person')) Projects query results based on the given projection list
Limit number of tuples, input expr Limit(10, NodeScan('Person')) Limits the input list to the given number of tuples
Join condition expression, input1, input2 Computes a join from the given inputs
LeftOuterJoin condition expression, input1, input2 Computes a left outer join from the given inputs
Sort sort function, input Orders tuples according to the sorting function
Aggregate [ AggregateType list ], input
GroupBy [ GroupKey list ], [ AggregateType list ], input Groups all tuples based on grouping key(s) and apply aggregate functions
Union [ query list ], input expr Combines the tuples of multiple queries
Distinct input_expr Distinct(Project([pb::label($0:qresult)], NodeScan())) Eliminates duplicates from the result
Create (n:NodeType { key: val, ...} ), input Creates a new node from the literals or the input
Create ($1)-[r:RelationshipType { key: val, ...} ]->($2), input Creates a new relationship from the literals or the input
RemoveNode input expression Deletes the nodes returned from the input expression. If a node is still connected by a relationship then the query is aborted.
RemoveRelationship input expression Deletes the relationships returned from the input expression.
DetachNode input expression Deletes the nodes returned from the input expression together with all their relationships.
Algorithm algorithm name, TUPLE or SET, param list Algorithm([NumLinks, TUPLE], Filter($0.id < 100000, NodeScan('Person'))) Runs the given algorithm either in per-tuple mode (TUPLE) or on the whole input (SET) with the given parameters.

In addition to executing queries, pcli provides several utility commands:

poseidon> help
Available commands:
	help                             show this help
	string s                         display the dictionary code of the string s
	code c                           display the string of the dictionary code c
	load <library>                   load the given shared library
	stats                            print database statistics
	sync                             ensure that all pages are written to disk
	create index <label> <property>  create an index for the given label/property
	drop index <label> <property>    delete the index for the given label/property
	@file                            execute the query stored in the given file
	explain <query-expr>             execute the given query and print the plan
	<query-expr>                     execute the given query
	print node|rship <id>            print the raw data of the node/relationship with given id

Query implementation

Queries - or better query plans - can be also directly implemented in C++ by using the query class. This class provides methods to construct a query plan from a set of separate operators. Poseidon provides a push-based query engine, i.e. the query plan starts with scans. The following example shows an implementation of LDBC interactive short query #1:

namespace pj = builtin;

auto q = query(gdb)
          .nodes_where("Person", "id", [&](auto &p) { return p.equal(933); })
          .from_relationships(":isLocatedIn")
          .to_node("Place")
          .project(
              {PExpr_(0, pj::string_property(res, "firstName")),
               PExpr_(0, pj::string_property(res, "lastName")),
               PExpr_(0, pj::int_to_datestring(pj::int_property(res, "birthday"))),
               PExpr_(0, pj::string_property(res, "locationIP")),
               PExpr_(0, pj::string_property(res, "browser")),
               PExpr_(2, pj::int_property(res, "id")),
               PExpr_(0, pj::string_property(res, "gender")),
               PExpr_(0, pj::int_to_datetimestring(
                   pj::int_property(res, "creationDate"))) })
          .print();
  q.start();

Poseidon Python API

In addition, there is a thin wrapper library for providing a Python API. This API allows creating and opening graph databases, importing CVS data, and specifying query execution plans. The latter reflects the C++ query class (see above).

In the following Python example the database imdb is created by loading the IMDB data from three files.

import poseidon

# create a new graph database
graph = poseidon.graph_db()
graph.create("imdb")

# we need a mapping table for mapping logical node ids to ids used for
# creating relationships
m = graph.mapping()

# import nodes and relationships for the IMDB database
n = graph.import_nodes("Movies", "imdb-data/movies.csv", m)
print(n, "movies imported.")
n = graph.import_nodes("Actor", "imdb-data/actors.csv", m)
print(n, "actors imported.")
n = graph.import_relationships("imdb-data/roles.csv", m)
print(n, "roles imported.")

After creating the graph database, it can be queried:

q = poseidon.query(graph) \
    .all_nodes("Movies") \
    .print()

q.start()

Storage structure

Poseidon stores nodes and relationships in separate vectors where each vector is implemented as a chunked vector stored on disk pages, i.e. a linked list of array of fixed size. Furthermore, properties are stored separately in a third vector. Whereas for nodes and relationships each object is represented by its own record, properties belonging to the same node or relationship are stored in batches of five properties per record. Note, that strings a compressed via dictionary compression and neither stored directly in nodes, relationships, or properties.

Transaction Processing

For transaction processing Poseidon implements a multiversion timestamp ordering (MVTO) protocol. Here, the most recent committed version is always stored persistently while dirty versions (nodes/relationships which are currently inserted or updated) as well as outdated versions are stored in a dirty list in volatile memory.

Publications

The architecture of the persistent memory-based version is described here. This engine is the foundation for other research works, e.g.

About

Poseidon core - storage and transaction management for property graphs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages