IPPV (Iterative Propose-Prune-and-Verify)

[SIGMOD2025] An Efficient and Exact Algorithm for Locally h-Clique Densest Subgraph Discovery

Detecting locally non-overlapping, near-clique densest subgraphs is essential for community search in social networks. Our work introduces an efficient and exact algorithm to tackle this challenge by identifying the top-k non-overlapping, locally h-clique densest subgraphs (LhCDS).

检测局部不重叠、接近派系最密集的子图对于社交网络中的社区搜索至关重要。我们的工作引入了一种高效且精确的算法，通过识别 top-k 非重叠、局部 h 团最稠密子图 (LhCDS) 来应对这一挑战。

Key highlights of our approach:

Iterative Propose-Prune-and-Verify (IPPV) Pipeline:
- Propose: Generate candidates using convex programming without missing any dense substructures.
- Prune: Handle overlapping cliques with an advanced graph decomposition technique.
- Verify: Use basic and fast verification methods to confirm LhCDS candidates, leveraging a flow network to enhance speed.
Generalization: Our approach extends to broader densest subgraph detection problems while ensuring exactness and low computational complexity.
- Extensive experiments on real datasets demonstrate the effectiveness and efficiency of our IPPV algorithm.

ArXiv

Left figure shows the relationships between a subset of characters in “Harry Potter”. The top-1 and top-2 L3CDSes are the blue and green subgraphs, respectively. The top-1 L3CDS is a family named Weasley, and the top-2 L3CDS is an organization named Death Eaters, which indicate the potential of LℎCDS discovery for mining diverse dense communities.

Data Preparation

The input data format is: Fisrt row is nodes_num edges_num, then each following row src_node dst_node represents an edge. Nodes should start from 0, e.g.

23133 93439
0 1
0 2
0 3
0 4
0 5
0 6
0 7
0 8
0 9

Table: Datasets used in our experiments

Name	Abbr.	\|V\|	\|E\|	3-clique nums	5-cliques num
soc-hamsterster	HA	2,426	16,630	53,251	298,013
CA-GrQc	GQ	5,242	14,484	48,260	2,215,500
fb-pages-politician	PP	5,908	41,706	174,632	2,002,250
fb-pages-company	PC	14,113	52,126	56,005	207,829
web-webbase-2001	WB	16,062	25,593	21,115	382,674
CA-CondMat	CM	23,133	93,439	173,361	511,088
soc-epinions	EP	26,588	100,120	159,700	521,106
Email-Enron	EN	36,692	183,831	727,044	5,809,356
loc-gowalla	GW	196,591	950,327	2,273,138	14,570,875
DBLP	DB	317,080	1,049,866	2,224,385	262,663,639
Amazon	AM	334,863	925,872	667,129	61,551
soc-youtube	YT	495,957	1,936,748	2,443,886	5,306,643
soc-lastfm	LF	1,191,805	4,519,330	3,946,207	10,404,656
soc-flixster	FX	2,523,386	7,918,801	7,897,122	96,315,278
soc-wiki-talk	WT	2,394,385	4,659,565	9,203,519	382,777,822

All the datasets can be downloaded from network repository

Installation

mkdir build
cd build/
cmake ..
make

Pipeline

Examples:

./LhCDScvx --args -g ./dataset/CA-CondMat.txt -h 3 -t 10 -k 20 -p 0

h means h-clique, t is iteration, k means top-k LhCDSm p is pattern.

The following are the regular subgraph models we support.

if (p == 0)
    {
        h_cliques = get_kcliques(adj, v2cliques, k, selected);
    }
    else if (p == 1)
    {
        h_cliques = get_kloops(adj, v2cliques, k, selected);
    }
    else if (p == 2)
    {
        h_cliques = get_doubletriangle(adj, v2cliques, k, selected);
    }
    else if (p == 3)
    {
        h_cliques = get_tritriangle(adj, v2cliques, k, selected);
    }
    else if (p == 4)
    {
        h_cliques = get_kstar(adj, v2cliques, k - 1, selected);
    }
    else if (p == 5)
    {
        h_cliques = get_ctriangle(adj, v2cliques, k - 1, selected);
    }
    for (int i = 0; i < h_cliques.size(); i++) {
        h_cliques_no.push_back(i);
    }

Experiments

1. Efficiency under Parameter Variations

Execute the following shell file to obtain the results of this experiment (note: the specific time may vary due to differences in hardware devices)

k_list=(5 10 15 20)
h_list=(3 4 5)
v_list=(1 0)
t=20
p=0
data_list=("fb-pages-company" "soc-hamsterster" "soc-epinions" "Email-Enron" "loc-gowalla" "CA-CondMat" "CA-GrQc" "Amazon")

for dataset in ${data_list[@]}
do
    if [ -z "$dataset" ]; then
    echo "Error: The variable 'dataset' is not set."
    exit 1
    fi

    # Create the directory if it does not exist
    mkdir -p "./output/$dataset"

    for k in ${k_list[@]}
    do
        for h in ${h_list[@]}
        do
            for t in ${t_list[@]}
            do
                for s in ${s_list[@]}
                do
                    for v in ${v_list[@]}
                    do
                    (/usr/bin/time -v nohup ./LDScvx -g ./dataset/${dataset}.txt -h $h -k $k -t $t -p $p -v $v&) >&  ./output/$dataset/${dataset}_LhCDScvx_k=${k}_h=${h}_t=${t}_s=${s}_p=${p}_v=${v}.txt
                    done
                done
            done
        done
    done
done

2. Efficiency v.s. Existing Algorithms

The code of LDSflow comes from the author of Locally Densest Subgraph Discovery.

3. Memory Overheads

Citation

@article{xu2024efficient,
  author = {Xiaojia Xu and Haoyu Liu and Xiaowei Lv and Yongcai Wang and Deying Li},
  title = {An Efficient and Exact Algorithm for Locally h-Clique Densest Subgraph Discovery},
  journal = {Proc. ACM Manag. Data},
  volume = {2},
  number = {N6},
  article = {225},
  year = {2024},
  month = {December},
  pages = {26},
  doi = {10.1145/3698800}
}

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
dataset		dataset
Args.cpp		Args.cpp
Args.h		Args.h
CMakeLists.txt		CMakeLists.txt
EdgeFN.cpp		EdgeFN.cpp
EdgeFN.h		EdgeFN.h
FlowNetwork.cpp		FlowNetwork.cpp
FlowNetwork.h		FlowNetwork.h
Graph.cpp		Graph.cpp
Graph.h		Graph.h
KClistCore.cpp		KClistCore.cpp
KClistCore.h		KClistCore.h
MaxFlow.cpp		MaxFlow.cpp
MaxFlow.h		MaxFlow.h
README.md		README.md
main.cpp		main.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IPPV (Iterative Propose-Prune-and-Verify)

Data Preparation

Table: Datasets used in our experiments

Installation

Pipeline

Experiments

1. Efficiency under Parameter Variations

2. Efficiency v.s. Existing Algorithms

3. Memory Overheads

Citation

About

Releases

Packages

Languages

Elssky/IPPV

Folders and files

Latest commit

History

Repository files navigation

IPPV (Iterative Propose-Prune-and-Verify)

Data Preparation

Table: Datasets used in our experiments

Installation

Pipeline

Experiments

1. Efficiency under Parameter Variations

2. Efficiency v.s. Existing Algorithms

3. Memory Overheads

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages