-
Notifications
You must be signed in to change notification settings - Fork 88
Graph Analyzing Algorithms
Gradoop provides a collection of classes wrapping Flink™ Gellys library methods for analyzing graphs. Thereby it is possible, to apply those methods to Gradoops extended property graph model. This section gives an overview of the implemented algorithm wrapper and their application.
Graph Analyzing Algorithms |
---|
Clustering Coefficient |
Hyperlink-Induced Topic Search |
Label Propagation |
PageRank |
Single Source Shortest Path |
Vertex Degrees |
Weakly Connected Components |
This algorithm computes the clustering coefficient as a measure of the degree to which vertices in a graph tend to build highly connected cluster. It is provided for both, directed and undirected graphs and computes 3 values, which are:
clustering coefficient | description |
---|---|
local | Measures the connectedness of a single vertex regarding the connections of its neighborhood. The value will be between 0.0 and 1.0, where 0.0 is assigned when there are no edges between the neighbors and 1.0 if all neighbors are fully connected with each other. Therefor the local clustering coefficient is the number of edges between neighbors divided by the number of all potential edges between neighbors. |
average | This is the mean over all local values. |
global | Measures the connectedness of the graph itself by computing the ratio from closed triplets (triangles) to all triplets, open and closed ones. Its value is between 0.0 and 1.0, where 0.0 is assigned if there aren't any closed triplets and 1.0 if all triplets are closed. |
The values for the local coefficient are written to the corresponding vertices as property. The values for the average and global coefficient are written to the graph head as property.
The application for a directed graph works as followed:
// graph with 3 fully connected vertices
String graphString = "graph[" +
"/* fully connected clique */" +
"(v0 {id:0, value:\"A\"})" +
"(v1 {id:1, value:\"B\"})" +
"(v2 {id:2, value:\"C\"})" +
"(v0)-[e0]->(v1)" +
"(v1)-[e1]->(v0)" +
"(v0)-[e2]->(v2)" +
"(v2)-[e3]->(v0)" +
"(v1)-[e4]->(v2)" +
"(v2)-[e5]->(v1)" +
"]";
// apply the algorithm
LogicalGraph graph = getLoaderFromString(graphString).getLogicalGraphByVariable("graph");
LogicalGraph resultGraph = graph.callForGraph(new GellyClusteringCoefficientDirected());
// read the coefficient values
List<Vertex> vertices = resultGraph.getVertices().collect();
GraphHead head = resultGraph.getGraphHead().collect().get(0);
for (Vertex v : vertices) {
System.out.println(v.getPropertyValue("id").toString() + ": " +
v.getPropertyValue(ClusteringCoefficientBase.PROPERTY_KEY_LOCAL).toString());
}
System.out.println("average: " + head.getPropertyValue(ClusteringCoefficientBase.PROPERTY_KEY_AVERAGE).toString());
System.out.println("global: " + head.getPropertyValue(ClusteringCoefficientBase.PROPERTY_KEY_GLOBAL).toString());
/* this will print
0: 1.0
1: 1.0
2: 1.0
average: 1.0
global: 1.0
*/
This algorithm detects the components of a graph. Two vertices belong to the same component, if they are connected by an edge, without taking the edge direction into account. The Gelly algorithm is implemented as a Scatter-Gather Iteration. Following Gradoop wrapper are provided:
Wrapper | description |
---|---|
AnnotateWeaklyConnectedComponents | Calls the Gelly algorithm to annotate the vertices (and edges if intended) with the corresponding component id. Arguments are: the maximum number of iterations, the property key to store the annotation and a boolean to determine if the edges are annotated, too. |
WeaklyConnectedComponentsAsCollection | Calls AnnotateWeaklyConnectedComponents and splits the Logical Graph into a Graph Collection by the given annotation property key, where each graph in this collection represents a weakly connected component. |
ConnectedComponentsDistribution | Calls AnnotateWeaklyConnectedComponents and aggregates the number of vertices and edges for each component. Returns a DataSet<Tuple3<String,Long,Long>> where each entry contains the components id, the corresponding number of vertices and edges. |
The application for WeaklyConnectedComponentsAsCollection works as followed:
// graph with 2 components
String graphString = "graph[" +
// First component
"(v0 {id:0, component:1})" +
// Second component
"(v1 {id:1, component:2})-[e0]->(v2 {id:2, component:2})" +
"(v1)-[e1]->(v3 {id:3, component:2})" +
"(v2)-[e2]->(v3)" +
"(v3)-[e3]->(v4 {id:4, component:2})" +
"(v4)-[e4]->(v5 {id:5, component:2})" +
"]";
// apply the algorithm
LogicalGraph graph = getLoaderFromString(graphString).getLogicalGraphByVariable("graph");
GraphCollection result = graph.callForCollection(new WeaklyConnectedComponentsAsCollection(propertyKey, 20));
/*
result contains two logical graphs:
first graph with vertex v0
second graph with vertices v1 to v5 and corresponding edges
*/