NeuralSAT is a high-performance verification tool for deep neural networks (DNNs). It integrates the DPLL(T) approach commonly used in SMT solving with a theory solver specialized for DNN reasoning. NeuralSAT exploits multicores and GPU for efficiency and can scale to networks with millions of parameters. It supports a wide range of neural networks and activation functions.
- NeuralSAT is ranked 2nd overall at VNN-COMP'24 (our 2nd participation)
- Originally, VNN-COMP’s script had a bug that incorrectly parsed NeuralSAT’s (and another tool’s) results, and ranked it last. After fixing the issue, NeuralSAT’s results were correctly parsed and NeuralSAT was officially placed 2nd, behind ABCrown and above PyRAT. The updated VNN-COMP’24 report, available here, acknowledges the issue, presents the corrected rankings (see Table B.1), and includes detailed results and graphs in Appendix B.
- First paper on NeuralSAT will be at FSE'24!
- NeuralSAT is given the "New Participation Award" at VNN-COMP'23
- NeuralSAT is ranked 4th in VNN-COMP'23. This was our first participation and we look forward to next time.
- Note: The current version of NeuralSAT adds significant improvements and fixed the implementation bugs we had during VNN-COMP'23 that produce unsound results (hence 4th place ranking).
- see INSTALL.md
-
fully automatic, ease of use and requires no tuning (i.e., no expert knowledge required)
- NeuralSAT requires no parameter tuning (a huge engineering effort that researchers often don't pay attention to)! In fact, you can just apply NeuralSAT as is to check your networks and desired properties. The user does not have to do any configuration or tweaking. It just works!
- But if you're an expert (or want to break the tool), you are welcome to tweak its internal settings.
- This is what makes NeuralSAT different from other DNN verifiers (e.g., AB-Crown), which require lots of tuning to work well.
- NeuralSAT requires no parameter tuning (a huge engineering effort that researchers often don't pay attention to)! In fact, you can just apply NeuralSAT as is to check your networks and desired properties. The user does not have to do any configuration or tweaking. It just works!
-
standard input and output formats
- input:
onnx
for neural networks andvnnlib
for specifications - output:
unsat
for proved property,sat
for disproved property (accompanied with a counterexample), andunknown
ortimeout
for property that cannot be proved.
- input:
-
versatile: support multiple types of neural types of networks and activation functions
- layers (can be mixture of different types):
fully connected
(fc),convolutional
(cnn),residual networks
(resnet),batch normalization
(bn) - activation functions:
ReLU
,sigmoid
,tanh
,power
- layers (can be mixture of different types):
-
well-tested
- NeuralSAT has been tested on a wide-range of benchmarks (e.g., ACAS XU, MNIST, CIFAR).
-
fast and among the most scalable verification tools currently
- NeuralSAT exploits and uses multhreads (i.e., multicore processing/CPUS) and GPUs available on your system to improve its performance.
-
active development and frequent updates
- If NeuralSAT does not support your problem, feel free to contact us (e.g., by openning a new Github issue). We will do our best to help.
- We will release new, stable versions about 3-4 times a year
details
- sound and complete algorithm: will give both correct
unsat
andsat
results - combine ideas from conflict-clause learning (CDCL), abstractions (e.g., polytopes), LP solving
- employ multiple adversarial attack techniques for fast counterexamples (i.e.,
sat
) discovery
NeuralSAT takes as input the formula N
(with non-linear ReLU activation) and the formulae UNSAT
if the formula is unsatisfiable, indicating N
satisfies SAT
if the formula is satisfiable, indicating the N
does not satisfy
NeuralSAT uses a DPLL(T)-based algorithm to check unsatisfiability.
It applies DPLL/CDCL to assign values to boolean variables and checks for conflicts the assignment has with the real-valued constraints of the DNN and the property of interest.
If conflicts arise, NeuralSAT determines the assignment decisions causing the conflicts and learns clauses to avoid those decisions in the future.
NeuralSAT repeats these decisions and checking steps until it finds a full assignment for all boolean variables, in which it returns SAT
, or until it no longer can decide, in which it returns UNSAT
.
NeuralSAT constructs a propositional formula representing neuron activation status (Boolean Abstraction
) and searches for satisfying truth assignments while employing a DNN-specific theory solver to check feasibility with respect to DNN constraints and properties.
The process integrates standard DPLL components, which include deciding (Decide
) variable assignments, and performing Boolean constraint propagation (BCP
), with DNN-specific theory solving (Deduce
), which uses LP solving and the polytope abstraction to check the satisfiability of assignments with the property of interest.
If satisfiability is confirmed, it continues with new assignments; otherwise, it analyzes and learns conflict clauses (Analyze Conflict
) to backtrack.
NeuralSAT continues it search until it either proves the property (UNSAT
) or finds a total assignment (SAT
).
Boolean Abstraction
encodes the DNN verification problem into a Boolean constraint to be solved.
This step creates Boolean variables to represent the activation status of hidden neurons in the DNN.
NeuralSAT also forms a set of initial clauses ensuring that each status variable is either T
(active) or F
(inactive).
NeuralSAT iteratively searches for an assignment satisfying the clauses.
Throughout it maintains several state variables including: clauses, a set of clauses consisting of
the initial activation clauses and learned conflict clauses;
Decide
chooses an unassigned variable and assigns it a random truth value.
Assignments from Decide
are essentially guesses that can be wrong which degrades performance.
The purpose of BCP
, Deduce
, and Stabilize
– which are discussed below – is to eliminate unassigned
variables so that Decide has fewer choices.
BCP
detects unit clauses from constraints representing the current assignment and clauses and infers values for variables in these clauses.
For example, after the decision BCP
determines that the clause
AnalyzeConflict
processes an implication graph with a conflict to learn a new clause that explains the conflict.
The algorithm traverses the implication graph backward, starting from the conflicting node, while constructing a new clause through a series of resolution steps.
AnalyzeConflict
aims to obtain an asserting clause, which is a clause that will result a BCP
implication.
These are added to clauses so that they can block further searches from encountering an instance of the conflict.
T-solver
consists of two parts: Stabilize
and Deduce
:
-
Deduce
checks the feasibility of the DNN constraints represented by the current propositional variable assignment. This component is shared with NeuralSAT and it leverages specific information from the DNN problem, including input and output properties, for efficient feasibility checking. Specifically, it obtains neuron bounds using the polytope abstraction and performs infeasibility checking to detect conflicts. -
Stabilize
has a similar effect asBCP
– reducing mistaken assignments byDecide
– but it operates at the theory level not the propositional Boolean level. The key idea in using neuron stability is that if we can determine that a neuron is stable, we can assign the exact truth value for the corresponding Boolean variable instead of having to guess. Stabilization involves the solution of a mixed integer linear program (MILP) system. First, a MILP problem is created from the current assignment, the DNN, and the property of interest. Next, it collects a list of all unassigned variables which are candidates being stabilized. In general, there are too many unassigned neurons, soStabilize
restricts consideration tok
candidates. Because each neuron has a different impact on abstraction precision we prioritize the candidates. InStabilize
, neurons are prioritized based on their interval boundaries with a preference for neurons with either lower or upper bounds that are closer to zero. The intuition is that neurons with bounds close to zero are more likely to become stable after tightening.
As with any stochastic algorithm, NeuralSAT would perform poorly if it gets into a subspace of the search that does not quickly lead to a solution, e.g., due to choosing a bad sequence of neurons to split. This problem, which has been recognized in early SAT solving, motivates the introduction of restarting the search to avoid being stuck in such a local optima. NeuralSAT uses a simple restart heuristic that triggers a restart when either the number of processed assignments (nodes) exceeds a pre-defined number or the number of remaining assignments that need be checked exceeds a pre-defined threshold.
To gain insights into the performance improvements of NeuralSAT we require benchmarks that force the algorithm to search a non-trivial portion of the space of activation patterns. It is well-known that SAT problems can be very easy to solve regardless of their size or whether they are satisfiable or unsatisfiable. The same is true for DNN verification problems. The organizers of the first three DNN verifier competitions remark on the need for benchmarks that are "not so easy that every tool can solve all of them" in order to assess verifier performance.
To achieve this we leverage a systematic DNN verification problem generator GDVB.
GDVB takes a seed neural network as input and systematically varies a number of architectural parameters, e.g., number of layers, and neurons per layer, to produce a set of DNNs.
In this experiment, we begin with a single MNIST network with 3
layers, each with 1024
neurons and generate 38
different DNNs that cover combinations of parameter variations.
We leverage the fact that local robustness properties are a pseudo-canonical form for pre-post condition specifications and use GDVB to generate 16 properties with varying radii and center points.
Next we run two state-of-the-art verifiers: 38 * 16 = 608
combinations of DNN and property with a small timeout of 200
seconds.
Any problem that could be solved within that timeout was removed from the benchmark as "too easy".
This resulted in 90
verification problems that not only are more computationally challenging than benchmarks used in other studies, but also exhibit significant architectural diversity.
We use this MNIST_GDVB benchmark to study the variation in performance on challenging problems.
Here we focus primarily on the benefits and interactions among the optimizations in NeuralSAT compared to the baseline N
which is NeuralSAT without any optimization.
The plot shows the problems solved within the 900-second timeout for each technique sorted by runtime from fastest to slowest;
problems that timeout are not shown on the plot.
We omit the use of restart R
on its own, since it is intended to function in concert with parallelization.
Both stabilization S
and parallelization P
improve the number of problems solved and reduce cost relative to the baseline, but parallelism P
yields greater improvements.
When parallelism P
is combined with restart R
we see that the number of problems solved increases, but the average time increases slightly.
The plot shows the trend in verification solve times for each optimization combination across the benchmarks.
One can observe that adding more optimizations improves performance both by the fact that the plots are lower and extend further to the right.
For example, extending P
to P+S
shows lower solve times for the first 17
problems – the one's P
could solve – and that 38
of the 51
benchmark problems are solved.
Extending P+S
to the full set of optimizations exhibits what appears to be a degradation in performance for the first 23
problems solved and this is likely due to the fact that, as explained above, restart forces some re-exploration of the search.
However, the benefit of restart shows in the ability to significantly reduce verification time for 25
of the 48
problems solved by P+S+R
.
- Hai Duong (GMU, main developer)
- Linhan Li (GMU)
- Dong Xu (UVA)
- ThanhVu Nguyen (GMU)
- Matthew Dwyer (UVA)
- Hai Duong, ThanhVu Nguyen, Matthew Dwyer. A DPLL(T) Framework for Verifying Deep Neural Networks, Arxiv, 2024.
@misc{duong2024dpllt,
title={A DPLL(T) Framework for Verifying Deep Neural Networks},
author={Hai Duong and ThanhVu Nguyen and Matthew Dwyer},
year={2024},
eprint={2307.10266},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
- Hai Duong, Dong Xu, ThanhVu Nguyen, Matthew Dwyer. Harnessing Neuron Stability to Improve DNN Verification, Foundations of Software Engineering (FSE), 18 pages (to appear), 2024.
@misc{duong2024harnessing,
title={Harnessing Neuron Stability to Improve DNN Verification},
author={Hai Duong and Dong Xu and ThanhVu Nguyen and Matthew B. Dwyer},
year={2024},
eprint={2401.14412},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
The NeuralSAT research is partially supported by grants from NSF (1900676, 2019239, 2129824, 2200621, 2217071, 2238133, 2319131) and an Amazon Research Award.