-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathformalVerification.tex
66 lines (54 loc) · 5.77 KB
/
formalVerification.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
The main purpose of using formal verification in NN certification is to provide evidence that the NN not only performs its intended function, but also exhibits no unintended behavior.
We evaluated three state-of-the-art (SOTA) NN verifiers: $\alpha$-$\beta$-CROWN, Marabou, and Venus2. We applied the tools to formally verify the properties of four Collins NN applications. A \textit{property} is an assertion about the mathematical characteristics of a neural network, often in the form of input and output relationships.
\noindent{\bf Verification Tools}:
$\alpha$-$\beta$-CROWN \cite{abcrown} reduces the property verification to prove or disprove whether the outputs of a NN are always non-negative, given the input bounds. The NN is created by adding a linear output layer to the original NN, which models the property. It uses abstract interpretation techniques to efficiently propagate the linear bound of neuron values through a NN. Its branch-and-bound approach enables parallelization and it utilizes GPUs to accelerate its performance. It has been ranked as the top NN verifier in recent years of VNN-COMP. It supports many forms of activation functions (e.g., \textit{ReLU}, \textit{tanh}, \textit{sigmoid}).
Marabou \cite{marabou} \cite{marabou2} is based on satisfiability modulo theories (SMT). It transforms a NN property verification problem to a satisfiability (SAT) instance, which consists of a set of linear and non-linear constraints (due to the activation functions) on the neuron values. The property is falsified if and only if there exists an assignment of neuron values that satisfies all the constraints. Marabou accepts properties that are encoded as general linear constraints on both inputs and outputs. We used Marabou2 (dated March 2024) for our evaluation.
Venus \cite{venus} formulates a NN verification problem as a mixed-integer-linear-program (MILP). The property is falsified if and only if the corresponding program is feasible. The MILP can be transformed to a linear program that can be solved in polynomial-time. Venus2 leverages the efficient, commercial-off-the-shelf (COTS) linear program solver, Gurobi. It verifies NNs that only use piecewise linear activation functions. We used Venus2 (dated March 2024) for our evaluation.
\noindent{\bf Verification Process}: Our verification process often consisted of three main steps. First, we formulated a number of formal properties, which were derived from performance or safety requirements. Second, we manipulated the NN structure so that we can express the properties in a format that can be verified by the NN verifiers. For example, $\alpha$-$\beta$-CROWN does not support general linear constraints on NN inputs. We added an extra output layer to connect the NN inputs and outputs in parallel, using ONNX \emph{Concat} operator. Third, we tuned the tool executable parameters. Each tool has its own parameters, which could have a major impact on the verification performance for a specific instance. Tuning the parameters is not a straightforward process; it often requires knowledge of the underlying verification algorithm.
\begin{table} [htb]
\centering
\caption{Verification Results} \label{tab:results}
\begin{tabular}{|l | c| c| c|}
\hline
Property & ABCROWN & Marabou2 & Venus2 \\
\hline
ftoMaxfl & UNSAT (7.3) & UNSAT (493) & UNSAT (69.2) \\
\hline
ftoMaxflsat & SAT (3.0) & SAT (20) & \emph{Abort} \\
\hline
ftoMinfl & UNSAT (10.5) & UNSAT (1808) & UNSAT (30.3) \\
\hline
ftoMinflsat & SAT (3.1) & SAT (1448) & \emph{UNSAT (28.5)} \\
\hline
ftoaboveStr & UNSAT (12.8) &\emph{SAT (795)} & UNSAT (39.5)\\
\hline
ftoDown & UNSAT (6.5) & UNSAT (67) & UNSAT (25.6) \\
\hline
ftoStr & UNSAT (6.4) & UNSAT (0) & UNSAT (17.9) \\
\hline
ftoUp & UNSAT (6.8) & UNSAT (357) & UNSAT (52.4) \\
\hline
ftobelowStr & UNSAT (24.2) & \emph{SAT (223)} &\emph{Abort} \\
\hline
rclOut & UNSAT (6.5) & UNSAT (0) & UNSAT (24.6) \\
\hline
rclMaxFuel & SAT (1.8) & SAT (0) & SAT (1.0) \\
\hline
ropOrder & UNSAT (7.0) & \emph{Timeout} & - \\
\hline
ropRange & SAT (2.4) & \emph{Timeout} & - \\
\hline
ropMono & SAT (0.5) & - & - \\
\hline
fqmAcc1 & UNSAT (4) & - & - \\
\hline
fqmAcc2 & UNSAT (12) & - & - \\
\hline
\end{tabular}
\end{table}
\noindent{\bf Verification Results}: The verification results are summarized in Table~\ref{tab:results}. The first three letters of the property name indicate the application (FTO, RCL, ROP, or FQM).
A result of UNSAT means the property is proved. And a result of SAT means a counterexample was found. The runtime is in seconds. The timeout is set to 2000 seconds.
The verification was executed on a Linux server with 128 AMD EPYC 7601 32-Core Processors and 503GB memory. We ran Marabou2 with Split and Conquer (SNC) mode and four parallel threads.
Since Venus2 does not support the \emph{tanh} acitvation function and Marabou2 does not support the ONNX operator \emph{concat}, they could not verify some instances, which are indicated by the dashes in the table.
Overall, $\alpha$-$\beta$-CROWN was the most efficient and produced the correct results on all instances. Marabou2 generated incorrect results for two instances. And Venus2 generated incorrect results for one instance and aborted for two instances. Our preliminary investigation in the incorrect or aborting cases indicates tool implementation issues.
Note that the VNNLIB parsers used by the tools do not support general linear constraints (i.e., $AX\le B$). They support a limited subset of constraints defined by the VNNCOMP benchmarks (e.g., variable compared to constant, comparing two variables). We modified the VNNLIB parser of $\alpha$-$\beta$-CROWN to verify properties that are encoded as general linear constraints.