Skip to content

jgromero/graphviz-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

graphviz-bench

Implementation of a graph processing pipeline in the Apache Spark framework aimed at graph visualization:

  1. Data retrieval: The first step to create a linked data graph is retrieving data by issuing a SELECT SPARQL query to a triplestore or a federated linked database. The result of this query is a set of triples describing the entities and relationships of interest.

  2. Graph building: Afterwards, a graph is built from the retrieved data, which can be in-memory or stored in files as plain text (vertices and edge pairs), triples, etc. The objective of this task is to encode the graph in machine-processable format supported by a specific-purpose graph processing library.

  3. Graph calculations: Different graph measures can be calculated on the graph to improve the visualization. For example, the size and the colour of the vertices can be adjusted according to their degree; by using a more sophisticated ranking procedure, like the PageRank algorithm.

  4. Graph layout: Graph layout is the calculation of the spatial position of the vertices to offer a pleasant depiction and to facilitate the interpretation of the underlying information by the human users. A classic family of graph layout algorithms is force-based vertex placement, which computes attractive and repulsive forces for pairs of vertices to determine how close they should be drawn. Here we implement a Spark-based version of the Fruchterman-Reingold algorithm.

  5. Rendering (not included): Finally, the graph is displayed on the screen. Usually, a visualization library starts a layered process to translate the higher-level drawing primitives into instructions that are sent to the graphics driver and executed by the graphics card connected to the screen.

The code is adapted for benchmarking the performance of the complete process.

The implementation includes tests with four datasets:

  1. DrugBank: Drug to drug interactions from the Bio2RDF drugbank dump.

  2. DBPedia: Knowledge graphs from DBPedia.

  3. Synthetic graphs: Graphs generated by using the Erdös-Rényi model.

  4. SNAP: Graphs from the Stanford Network Analysis Platform dataset

We also include a reference implementation in sequential Java 8 based on Gephi, Jung and TinkerPop.

This work has been developed within the BIGFUSE project, and funded by the University of Granada and the Spanish Ministry of Education, Culture and Sport.