MILPBench is the largest MILP benchmark dataset to date, comprising 100,000 instances across 60 systematically categorized classes. MILPBench provides a diverse and comprehensive testbed for evaluating MILP-solving algorithms. By leveraging structural similarity metrics and GNN-based embeddings, we ensure that MILPBench achieves intra-class homogeneity and inter-class diversity, enabling robust evaluations. In addition, MILPBench includes a baseline library featuring 15 mainstream solving methods, spanning traditional solvers, heuristic algorithms, and ML-based approaches, making it the first dataset to offer such a complete evaluation framework. Experimental results demonstrate that MILPBench enables detailed performance comparisons and reveals inconsistencies in some algorithm claims, further underscoring its
We have meticulously assembled a substantial collection of mixed integer linear programming (MILP) instances from a variety of sources, including open-source, comprehensive datasets, domain-specific academic papers and competitions related to MILP. Additionally, we generated a substantial number of standard problem instances based on four canonical MILP problems: the Maximum Independent Set (MIS) problem, the Minimum Vertex Covering (MVC) problem, the Set Covering (SC) problem, Balanced Item Placement (BIP) problem, Combinatorial Auctions (CAT) problem, Capacitated Facility Location (CFL) problem, and Mixed Integer Knapsack Set (MIKS) problem For each type of problem, we generated instances at three levels of difficulty—easy, medium, and hard.
The sizes of each categorized datasets are as follows, and the download links are detailed in ./Benchmark Datasets/README.md
.
Name(Path) | Number of Instances | Avg.Vars | Avg.Constrains |
---|---|---|---|
MIS_easy | 50 | 20000 | 60000 |
MIS_medium | 50 | 100000 | 300000 |
MIS_hard | 50 | 1000000 | 3000000 |
MVC_easy | 50 | 20000 | 60000 |
MVC_medium | 50 | 100000 | 300000 |
MVC_hard | 50 | 1000000 | 3000000 |
SC_easy | 50 | 40000 | 40000 |
SC_medium | 50 | 200000 | 200000 |
SC_hard | 50 | 2000000 | 2000000 |
BIP_easy | 50 | 4081 | 290 |
BIP_medium | 50 | 14182 | 690 |
BIP_hard | 50 | 54584 | 2090 |
CAT_easy | 50 | 2000 | 2000 |
CAT_medium | 50 | 22000 | 22000 |
CAT_hard | 50 | 2000000 | 2000000 |
CFL_easy | 50 | 16040 | 80 |
CFL_medium | 50 | 144200 | 320 |
CFL_hard | 50 | 656520 | 800 |
MIKS_easy | 50 | 5000 | 5000 |
MIKS_medium | 50 | 55000 | 55000 |
MIKS_hard | 50 | 1000000 | 1000000 |
nn_verification | 3622 | 7144.02 | 6533.58 |
item_placement | 10000 | 1083 | 195 |
load_balancing | 10000 | 61000 | 64307.19 |
anonymous | 138 | 34674.03 | 44498.19 |
HEM_knapsack | 10000 | 720 | 72 |
HEM_mis | 10002 | 500 | 1953.48 |
HEM_setcover | 10000 | 1000 | 500 |
HEM_corlat | 1984 | 466 | 486.17 |
HEM_mik | 90 | 386.67 | 311.67 |
vary_bounds_s1 | 50 | 3117 | 1293 |
vary_bounds_s2 | 50 | 1758 | 351 |
vary_bounds_s3 | 50 | 1758 | 351 |
vary_matrix_s1 | 50 | 802 | 531 |
vary_matrix_rhs_bounds_s1 | 50 | 27710 | 16288 |
vary_matrix_rhs_bounds_obj | 50 | 7973 | 3558 |
vary_obj_s1 | 50 | 360 | 55 |
vary_obj_s2 | 50 | 745 | 26159 |
vary_obj_s3 | 50 | 9599 | 27940 |
vary_rhs_s1 | 50 | 12760 | 1501 |
vary_rhs_s2 | 50 | 1000 | 1250 |
vary_rhs_s3 | 50 | 63009 | 507 |
vary_rhs_s4 | 50 | 1000 | 1250 |
vary_rhs_obj_s1 | 50 | 90983 | 33438 |
vary_rhs_obj_s2 | 50 | 4626 | 8274 |
Aclib | 99 | 181 | 180 |
Coral | 279 | 18420.92 | 11831.01 |
Cut | 14 | 4113 | 1608.57 |
ECOGCNN | 44 | 36808.25 | 58768.84 |
fc.data | 20 | 571 | 330.5 |
MIPlib | 50 | 7719.98 | 6866.04 |
Nexp | 77 | 9207.09 | 7977.14 |
Transportation | 32 | 4871.5 | 2521.467 |
MIPLIB_collection_easy | 649 | 119747.4 | 123628.3 |
MIPLIB_collection_hard | 107 | 96181.4 | 101135.8 |
MIPLIB_collection_open | 204 | 438355.9 | 258599.5 |
MIRPLIB_Original | 72 | 36312.2 | 11485.8 |
MIRPLIB_Maritime_Group1 | 40 | 13919.5 | 19329.25 |
MIRPLIB_Maritime_Group2 | 40 | 24639.8 | 34053.25 |
MIRPLIB_Maritime_Group3 | 40 | 24639.8 | 34057.75 |
MIRPLIB_Maritime_Group4 | 20 | 4343.0 | 6336.0 |
MIRPLIB_Maritime_Group5 | 20 | 48330.0 | 66812.0 |
MIRPLIB_Maritime_Group6 | 20 | 48330.0 | 66815.0 |
To validate the effectiveness of the proposed dataset, we organized the existing mainstream methods into a Baseline Library and conducted comparisons using Benchmark Datasets against these mainstream baselines. The algorithms in the Baseline Library are as follows.
Baseline | Code |
---|---|
Gurobi | ./Baseline Library/Gurobi/ |
SCIP | ./Baseline Library/SCIP/ |
CPLEX | ./Baseline Library/CPLEX/ |
Large Neighborhood Search | ./Baseline Library/LNS/ |
General Large Neighborhood Search | ./Baseline Library/GLNS/ |
Least Integeal Heuristic | ./Baseline Library/LIH/ |
Most Integeal Heuristic | ./Baseline Library/MIH/ |
Relaxation Induced Neighborhood Search | ./Baseline Library/RINS/ |
Adaptive Constraint Partition Based Optimization Framework | ./Baseline Library/ACP/ |
Learn to Branch | ./Baseline Library/Learn2Branch/ |
GNN&GBDT-Guided Fast Optimizing Framework | ./Baseline Library/GNN&GBDT/ |
GNN-Guided Predict-and-Search Framework | ./Baseline Library/Predict&Search |
Neural Diving | ./Baseline Library/Neural Diving |
Hybrid Learn to Branch | ./Baseline Library/Hybrid_Learn2Branch |
Graph Neural Networks with Random Feat | ./Baseline Library/GNN_MILP |
The codes are shown in ./Similarity Evaluation/Similarity/Structure Similarity
. The following bash command can be run to calculate the structural embedding similarity of a given dataset containing several instances of MILP in .lp
format.
base_dir="<The dataset folder>"
# Traverse all subfolders under _latest_datasets
find "$base_dir" -type d -name LP | while read lp_dir; do
# Get the directory where the LP is located, i.e. the instance directory
instance_dir=$(dirname "$lp_dir")
# Get the question name
problem_name=$(basename "$instance_dir")
echo "Processing problem: $problem_name, directory: $instance_dir"
# Create the Test0 folder in the instance directory
mkdir -p "$instance_dir/Test0"
# Run MILP_utils.py
python MILP_utils.py --mode=model2data \
--input_dir="$lp_dir" \
--output_dir="$instance_dir/Test0" \
--type=direct
# Run graph_statistics.py
python graph_statistics.py --input_dir="$instance_dir/Test0" \
--output_file="$instance_dir/statistics"
# Run calc_sim.py and output the results
python calc_sim.py --input_file1="$instance_dir/statistics" > "$instance_dir/result.txt"
done
The codes are shown in ./Similarity Evaluation/Similarity/Neural Embedding Similarity
. The following bash command can be run to calculate the neural embedding similarity of a given dataset containing several instances of MILP in .lp
format.
special_dir="<Your Dataset Folder>"
find "$special_dir" -type d -name LP | while read lp_dir; do
process_instance "$lp_dir"
done
process_instance() {
lp_dir=$1
# Get the directory where the LP is located, i.e. the instance directory
instance_dir=$(dirname "$lp_dir")
# Get the question name
problem_name=$(basename "$instance_dir")
echo "Processing problem: $problem_name, directory: $instance_dir"
# Create the Test0 folder in the instance directory
mkdir -p "$instance_dir/Test0"
# Run MILP_utils.py
python MILP_utils.py --mode=model2data \
--input_dir="$lp_dir" \
--output_dir="$instance_dir/Test0" \
--type=direct
# Run inference.py to inference
python src/inference.py --dataset=MILP \
--cfg_path=./experiments/configs/test.yml \
--seed=1 \
--device=2 \
--model_path=./experiments/weights/encoder.pth \
--input_dir="$instance_dir/Test0" \
--output_file="$instance_dir/embedding" \
--filename="$instance_dir/namelist" || echo "Inference failed"
# Run calc_sim.py and output the results
python calc_sim.py --input_file1="$instance_dir/embedding" > "$instance_dir/result_embedding.txt"
}
The codes are shown in ./Similarity Evaluation/Classification
. The following bash command can be run to build, train, inference and cluster. For classification, we can use our trained model and just run build, inference and cluster in turn.
case $1 in
build|train|inference|cluster)
echo "Valid argument: $1"
;;
*)
echo "Invalid argument: $1. Please enter correct choice."
exit 1
;;
esac
if [ "$1" = "build" ]; then
python src/MILP_utils.py --mode=model2data --input_dir=dataset/model --output_dir=dataset/data --type=direct
fi
if [ "$1" = "train" ]; then
python src/train.py --dataset=MILP --cfg_path=experiments/configs/test.yml --seed=1 --device=0 --model_path=experiments/weights/encoder.pth --dataset_path=dataset/data || echo "Training failed"
fi
if [ "$1" = "inference" ]; then
python src/inference.py --cfg_path=experiments/configs/test.yml --seed=1 --device=0 --model_path=experiments/weights/encoder.pth --input_dir="<Your Dataset Folder>" --output_file=tmp.pkl --filename=namelist.pkl || echo "Inference failed"
fi
if [ "$1" = "cluster" ]; then
python src/clustering.py --filename=namelist.pkl --input_file=tmp.pkl
fi