Skip to content

Latest commit

 

History

History
46 lines (36 loc) · 2.52 KB

README.md

File metadata and controls

46 lines (36 loc) · 2.52 KB

Butterfly Counting in Bipartite Networks

Paper

You may find our paper on ArXiv: https://arxiv.org/pdf/1801.00338v4.pdf

Our paper is published in KDD2018: https://dl.acm.org/citation.cfm?id=3220097

Please use the following BiBTeX for citing our paper:

@inproceedings{sanei2018butterfly,
  title={Butterfly Counting in Bipartite Networks},
  author={Sanei-Mehri, Seyed-Vahid and Sariyuce, Ahmet Erdem and Tirthapura, Srikanta},
  booktitle={Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
  pages={2150--2159},
  year={2018},
  organization={ACM}
}

Abstract

We consider the problem of counting motifs in bipartite affiliation networks, such as author-paper, user-product, and actor-movie relations. We focus on counting the number of occurrences of a "butterfly", a complete 2 x 2 biclique, the simplest cohesive higher-order structure in a bipartite graph. Our main contribution is a suite of randomized algorithms that can quickly approximate the number of butterflies in a graph with a provable guarantee on accuracy. An experimental evaluation on large real-world networks shows that our algorithms return accurate estimates within a few seconds, even for networks with trillions of butterflies and hundreds of millions of edges.

There are 4 butterflies in the entire graph G, and the number of per-vertex and per-edge butterflies are shown for some vertices/edges.

Watch our Promotional Video for This Work!

Promotional Video

Implementation

BFC.cpp is the source code for an exact algorithm and a suite of randomized algorithms to count the number of butterflies in a bipartite network. This is the implementation for our paper available on ArXiv: https://arxiv.org/pdf/1801.00338v4.pdf

Disclaimer: For the sake of simplicity for reviewers and researchers to understand our implementation, this source code is a little bit different than what we ran on the large bipartite networks. For example, in this implementation we DO NOT remove vertices with degree zero, or we use vectors for adjacency lists and binary_search function for edge sampling. However, as mentioned in the paper, we could use hashsets (unordered_sets) instead of vectors.