Normal Forms and the DPLL Algorithm — Math Logic & Algorithms

Motivation: Automated Satisfiability Checking

The SAT (SATisfiability) problem—whether a propositional formula is satisfiable—is NP-complete, but in practice is solved in seconds for millions of variables by modern SAT solvers. They operate on formulas in conjunctive normal form (CNF) and use the DPLL algorithm and its extensions. SAT is the "workhorse" of formal verification, planning, and synthesis.

Normal Forms

Literal: A variable $p$ or its negation $\neg p$.

Disjunct: A disjunction of literals: $(l_1 \lor l_2 \lor \ldots \lor l_k)$. A unit disjunct: a disjunct consisting of a single literal.

CNF (Conjunctive Normal Form): A conjunction of disjuncts: $D_1 \wedge D_2 \wedge \ldots \wedge D_m$. Any propositional formula can be converted to CNF (equisatisfiable, if using Tseitin transformation).

Algorithm to convert to CNF:

Eliminate $\to$ and $\leftrightarrow$: $p \to q \equiv \neg p \vee q$; $p \leftrightarrow q \equiv (\neg p \vee q) \wedge (p \vee \neg q)$
Push $\neg$ inward (de Morgan's laws, double negation)
Apply distributivity of $\vee$ over $\wedge$

DNF (Disjunctive Normal Form): Disjunction of conjuncts. Satisfiable $\iff$ at least one conjunct does not contain both $p$ and $\neg p$.

The SAT Problem and the DPLL Algorithm

DPLL (Davis–Putnam–Logemann–Loveland, 1960/62): A recursive algorithm for searching for a satisfying assignment:

Unit Propagation: If a disjunct contains exactly one unassigned literal $l$ $\to$ set $l = 1$; remove all disjuncts containing $l$; remove $\neg l$ from the remaining disjuncts. Repeat until a fixpoint.
Pure Literal Elimination: If literal $l$ occurs only positively (i.e., $\neg l$ does not occur) $\to$ set $l = 1$; remove all disjuncts containing $l$.
Variable Selection: Choose an unassigned $p$; recursively try $p = 1$, then $p = 0$.
Backtrack: Upon encountering an empty disjunct—contradiction, backtrack.

Modern CDCL solvers: In the case of a conflict, add a "learned disjunct" (clause learning) that excludes this particular conflict. They solve problems with millions of variables (cryptographic circuits, processor verification).

Numerical Example

Problem: Apply DPLL to the formula: $(p \vee q) \wedge (\neg p \vee r) \wedge (\neg q \vee \neg r) \wedge (p \vee \neg r)$.

Step 1. Unit Propagation. No unit disjuncts. No pure literals ($p$ and $\neg p$ both occur; $q$ and $\neg q$; $r$ and $\neg r$).

Step 2. Choice: $p = 1$.

$(p \vee q)$: remove (contains $p = 1$) ✓
$(\neg p \vee r)$: remove $\neg p$ from this disjunct $\to$ $(r)$
$(\neg q \vee \neg r)$: remains
$(p \vee \neg r)$: remove ✓ Remainder: $(r) \wedge (\neg q \vee \neg r)$.

Step 3. Unit Propagation: $(r) \to r = 1$.

$(\neg q \vee \neg r)$: remove $\neg r$ $\to$ $(\neg q)$ Remainder: $(\neg q)$.

Step 4. Unit Propagation: $(\neg q) \to q = 0$. All disjuncts are satisfied.

Solution: $p = 1$, $q = 0$, $r = 1$. Formula is satisfiable ✓.

Verification: $(1 \vee 0) \wedge (0 \vee 1) \wedge (1 \vee 0) \wedge (1 \vee 0) = 1 \wedge 1 \wedge 1 \wedge 1 = 1$ ✓.

Second option (if DPLL hit a dead end): if $r = 0 \to (\neg q \vee 0) = (\neg q) \to q = 0 \to (0 \vee 0) = 0$—contradiction! Backtrack.

Real-World Application

Chip development (Intel, AMD): before releasing a processor, hundreds of thousands of properties are checked by SAT solvers (hardware verification). Mission planning (NASA): a planning task is encoded in SAT in polynomial time, and the solver finds a sequence of actions.

Additional Aspects

The DPLL algorithm (Davis–Putnam–Logemann–Loveland) is the foundation of all modern SAT solvers. Its CDCL improvements add clause learning: at every conflict, a new clause is constructed, excluding repetition of the same dead end. This gives exponential improvement on structured problems. Tseitin CNF normalization introduces auxiliary variables for subformulas, thereby preserving linear conversion size (naive distributive normalization can lead to exponential blow-up). DNF, on the contrary, is convenient for automata and truth tables. Notably, converting a CNF formula to DNF is an exponentially hard task.

Connection with Other Branches of Mathematics

Logical normal forms are unexpectedly close to classical branches of algebra. Already in the works of Hilbert and Ackermann, logic is treated as part of universal algebra: boolean formulas are elements of free boolean algebra, and CNF corresponds to representing these elements as intersections (conjunctions) of ideals specified by disjuncts. Stone's theorem on the representation of boolean algebras via topological spaces of ultrafilters actually interprets the set of models of a CNF formula as an open set in the Stone space, thereby linking SAT with topology.

In probability theory, CNF appears as logical event schemes: satisfiability probability is replaced by the estimate of the probability that at least one disjunct in each conjunction is true. Approaches like the probabilistic methods of Lovász and Erdős enable estimating the sizes of random CNFs that are almost surely unsatisfiable, a topic that intersects with phase transitions in random SAT models and with methods from statistical physics (spin glass models of Mézard and Paranjape).

Numerical methods and differential equations are connected to SAT via hybrid optimization methods. Since the early 2000s, differential approximations to discrete problems have been developed: trajectories of dynamical systems approximate search processes; analysis of the stability of stationary points is grounded in Lyapunov’s ideas. In SMT solver variations (Satisfiability Modulo Theories), logic constraints in CNF are combined with linear algebra and systems of differential equations, so satisfiability checking becomes a mixed discrete-continuous optimization problem.

From the perspective of computational complexity, the DPLL algorithm implements an exponential search tree, closely related to the Cook–Levin theorem on the NP-completeness of SAT. Resolution methods (Rabin, Robinson) provide a logical analogue of Gaussian elimination, and the lower bounds on resolution proof lengths from Håstad–Razborov, Benas and others are interpreted as ultimate performance limits for all DPLL-like solvers.

Historical Note and Development of the Idea

Representations of logical formulas in standard normal forms already appeared in the works of Boole and Jevons in the second half of the 19th century; however, systematic theory was developed at the start of the 20th century by the Hilbert school. In the book "Grundzüge der theoretischen Logik" (1928), Hilbert and Ackermann formalized the transformation of formulas to equivalent CNF and DNF. The satisfiability question was first raised in the context of the general theory of solutions to logical equations (Church, Tarski). In the 1930s, Moses Schönfinkel and subsequent logicians viewed normal forms as a tool for automating deduction. A turning point was the 1960 article by Martin Davis and Hilbert Putnam in the Journal of the ACM, where they proposed an algorithm based on resolution and systematic variable elimination. Two years later, Davis, Logemann, and Loveland modified the approach by introducing a recursive branching scheme by literals with formula simplification along the search; this particular version became known as DPLL. Motivating problems were theorem proving automation and program correctness checking in early formal verification systems. In 1971 Cook, and independently Levin, showed the NP-completeness of SAT, making it a central problem in complexity theory.