SQLSolver is an automated prover that verifies the equivalence of SQL queries, which is presented in the paper "Proving Query Equivalence Using Linear Integer Arithmetic" (SIGMOD 2024). We also provide an online demo and you can try it :-)
- Ubuntu (22.04.1 LTS is tested)
- z3 4.8.9 (SMT solver)
- antlr 4.8 (Generate tokenizer and parser for SQL AST)
- Python 3
- Java 17
- Gradle 7.3.3
z3 and antlr library have been put in lib/
off-the-shelf.
Python is typically installed by default.
Type python3 --version
to check whether Python 3 is installed.
In cases where it is not installed, use this following instruction:
sudo apt install python3
If you do not have Java or Gradle installed, you may refer to these instructions:
# Installing Java 17
sudo add-apt-repository ppa:linuxuprising/java
sudo apt update
sudo apt-get install -y oracle-java17-installer oracle-java17-set-default
# Installing Gradle 7.3.3
wget https://services.gradle.org/distributions/gradle-7.3.3-bin.zip -P /tmp
sudo unzip -d /opt/gradle /tmp/gradle-7.3.3-bin.zip
sudo touch /etc/profile.d/gradle.sh
sudo chmod a+wx /etc/profile.d/gradle.sh
sudo echo -e "export GRADLE_HOME=/opt/gradle/gradle-7.3.3 \nexport PATH=\${GRADLE_HOME}/bin:\${PATH}" >> /etc/profile.d/gradle.sh
source /etc/profile
We use Gradle as the project build tool. After all the above dependencies are installed, you can compile the SQLSolver project with the following command.
gradle compileJava
After Compilation, you can build a JAR file using the following command.
gradle fatjar
The JAR file will be generated in the build/libs/
directory relative
to the current directory.
To use the JAR file of SQLSolver, you need to put libz3.so and libz3java.so in a custom directory: <path/to/lib>
and then use the command:
export LD_LIBRARY_PATH=<path/to/lib>
The code below shows how to use SQLSolver's JAR File to verify SQLs.
-sql1=<path/to/query1>
: the first file whose path is<path/to/query1>
to store the SQLs.-sql2=<path/to/query2>
: the second file whose path is<path/to/query2>
to store the SQLs.-schema=<path/to/schema>
: the schema file whose path is<path/to/schema>
to store the schema.-print
: output the result to standard output stream.-output=<path/to/output>
: output the result to a file whose path is<path/to/output>
.
Each SQL file has multiple SQL statements and should store a SQL statement in one line, and the corresponding lines in both files will be considered as pairs of SQL statements to be verified for equivalence.
java -jar sqlsolver.jar -sql1=<path/to/query1> -sql2=<path/to/query2> -schema=<path/to/schema> [-print] [-output=<path/to/output>]
Since SQLSolver uses the parser of Calcite to parse SQL queries, please ensure that queries in SQL files satisfy the syntax requirements of Calcite parser.
The example here shows how to use SQLSolver's JAR File to verify three simple SQLs.
First SQL file with path ./example_sql1.sql
:
SELECT i, j FROM a
SELECT x, y FROM b
Second SQL file with path ./example_sql2.sql
:
SELECT T.COL1, T.COL2 FROM (SELECT i AS COL1, j AS COL2 FROM a) AS T
SELECT T.COL1, T.COL2 FROM (SELECT x AS COL1, y AS COL2 FROM b) AS T
Schema file with path ./example_schema.sql
:
CREATE TABLE a ( i INT PRIMARY KEY, j INT, k INT );
CREATE TABLE b ( x INT PRIMARY KEY, y INT, z INT );
Then, use the following command to verify the three SQL pairs:
java -jar sqlsolver.jar -sql1=./example_sql1.sql -sql2=./example_sql2.sql -schema=./example_schema.sql -output=./result
You will get a file named result
in the current directory with the following content:
EQ
EQ
The Java users can directly download SQLSolver's source code and access the package sqlsolver.api
.
The entry of SQLSolver is in sqlsolver.api.entry.Verification
.
There are two main interfaces:
/**
* Verify two sql equivalence.
*/
VerificationResult verify(String sql0, String sql1, String schema)
/**
* Verify pairwise sql equivalence in the sqlList.
* The two sql to verify are in the same index of both lists.
* The two sqlList (sqlList0 and sqlList1) should have same size.
*/
List<VerificationResult> verify(List<String> sqlList0, List<String> sqlList1, String schema)
/**
* Verify pairwise sql equivalence in the sqlList.
* It resembles Verification#verify(List, List, String) except that it sets a time limitation (seconds) for proving each pair of SQL queries.
*/
List<VerificationResult> verify(List<String> sqlList0, List<String> sqlList1, String schema, long timeout)
The VerificationResult is an enum class for verification result which has four cases:
- EQ: two SQL queries are equivalent
- NEQ: two SQL queries are not equivalent
- UNKNOWN: SQLSolver cannot determine the equivalence of two SQL queries due to some reasons, such as unsupported SQL features and syntax errors
- TIMEOUT: SQLSolver can not determine the equivalence within a given time limitation
Note that verifying the equivalence of two SQL queries is an undecidable problem. Thus, the verification algorithm of SQLSolver does not guarantee to identify all equivalent queries. SQLSolver may output NEQ or UNKNOWN for some equivalent query pairs.
If you invoke the API with the parameter timeout
, SQLSolver may still run beyond the time limitation.
Beause SQLSolver invokes external libraries/binaries like Z3.
Sometimes SQLSolver has to wait for those libraries to finish.
You can also configure the time limitation for Z3 via the configuration file sqlsolver.properties
.
An example sqlsolver.properties
is under the project root directory.
You can import SQLSolver as a Jar file or directly download and compile the source code in your project.
By calling the interface of SQLSolver, you can get the SQLs verification result.
Before you use SQLSolver, you should put libz3.so
and libz3java.so
in a custom directory: <path/to/lib>
and then use the command:
export LD_LIBRARY_PATH=<path/to/lib>
Here is a simple example that shows how to use SQLSolver's interface:
import sqlsolver.api.entry.Verification;
import sqlsolver.superopt.logic.VerificationResult;
public class Main {
public static void main(String[] args) {
String sql1 = "SELECT i, j FROM a";
String sql2 = "SELECT T.COL1, T.COL2 FROM (SELECT i AS COL1, j AS COL2 FROM a) AS T";
String schema = "CREATE TABLE a ( i INT PRIMARY KEY, j INT, k INT );\n" +
"CREATE TABLE b ( x INT PRIMARY KEY, y INT, z INT );";
VerificationResult result = Verification.verify(sql1, sql2, schema);
System.out.println(result);
}
}
This program will output a single EQ
that indicates the verification result of two SQLs sql1
and sql2
under the schema schema
.
In our paper, SQLSolver is evaluated on four benchmarks, including test cases derived from Calcite, Spark SQL, TPC-C, and TPC-H. The files of SQL queries and schemas are listed in the following table.
Each file of SQL queries consists of multiple pairs of SQL queries. Each query in the odd-numbered line is equivalent to the query in the next line. For example, the first query is equivalent to the second query in each file.
Benchmark | File of Schema | File of SQL Queries |
---|---|---|
Calcite | Calcite Schema | Calcite Test Set |
Spark SQL | Spark SQL Schema | Spark SQL Test Set |
TPC-C | TPC-C Schema | TPC-C Test Set |
TPC-H | TPC-H Schema | TPC-H Test Set |
This repository includes the source code and benchmarks.
|-- api # SQLSolver's entry.
|-- lib # Required external library.
|-- common # Common utilites.
|-- sql # Data structures of SQL AST and query plan.
|-- stmt # Manipulation program of queries.
|-- superopt # Core algorithm of SQLSolver
|-- sqlsolver_data # Data input/output directory, such as benchmarks
If you use SQLSolver in your projects or research, please kindly cite our paper:
@article{sqlsolver,
author = {Ding, Haoran and Wang, Zhaoguo and Yang, Yicun and Zhang, Dexin and Xu, Zhenglin and Chen, Haibo and Piskac, Ruzica and Li, Jinyang},
title = {Proving Query Equivalence Using Linear Integer Arithmetic},
year = {2023},
issue_date = {December 2023},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {1},
number = {4},
journal = {Proc. ACM Manag. Data},
month = {Dec},
articleno = {227},
numpages = {26},
}
If you have any questions, please submit an issue or contact our email.
Students
- Haoran Ding, IPADS, Shanghai Jiao Tong University
- Zhuoran Wei, IPADS, Shanghai Jiao Tong University
- Zhenglin Xu, IPADS, Shanghai Jiao Tong University
- Yicun Yang, IPADS, Shanghai Jiao Tong University
- Dexin Zhang, Princeton University
Professors
- Zhaoguo Wang, IPADS, Shanghai Jiao Tong University
- Haibo Chen, IPADS, Shanghai Jiao Tong University
- Ruzica Piskac, ROSE, Yale University
- Jinyang Li, New York University