Introduction
This article is the second post in my three-part series exploring mathematical approaches to analysing AWS networks.
- Solving AWS Network Puzzles with Mathematics - Part 1
- Solving AWS Network Puzzles with Mathematics - Part 2 (we are here)
As discussed in the previous instalment, network troubleshooting presents inherent challenges that cannot be overstated. Network and security configurations are diverse and multifaceted in cloud computing environments like AWS. Moreover, these configurations interact in complex ways to determine whether communication paths connect or disconnect. Many readers have likely found themselves caught between pressing tasks and uncooperative networks, leading to moments of genuine frustration.
This series aims to address the inherent complexity of networks, with particular emphasis on systematic and objective approaches that transcend individual experience or skill levels. In my previous article, I introduced VPC Reachability Analyzer as an automated solution for network troubleshooting challenges. In this second part, we shall shift our focus slightly to examine the core technology that powers this functionality.
At the heart of VPC Reachability Analyzer lies a powerful mathematical tool: SMT solvers.
SAT Solvers
To understand SMT solvers, we must first examine their foundational component: SAT solvers. "SAT" comes from "satisfiability," a concept central to computational logic.
Let us consider a propositional logic formula. Such a formula might take forms like P ∨ Q or ¬(P ∧ Q) ∨ R, where P, Q, and R represent propositional variables. For those familiar with programming languages, these propositional variables can be understood as analogous to boolean values in code.
Since P, Q, and R are boolean values, they can each be assumed true or false. Satisfiability (SAT) problems ask whether assigning appropriate values to these variables is possible so that the entire formula is evaluated as true.
When learning new concepts, examining concrete examples often proves illuminating. Consider the following formula:
(P ∨ Q) ∧ (P ∨ ¬Q) ∧ (¬P ∨ ¬Q)
What boolean values should we assign to P and Q to satisfy this formula? Let us reason through this step by step. We can deduce that at least one of P or Q must be true from P ∨ Q being true. Furthermore, ¬P ∨ ¬Q being true tells us that at least one of P or Q must be false. Finally, P ∨ ¬Q being true leads us to conclude that P must be true and Q must be false.
Let us explore another example. Consider this modified formula:
(P ∨ Q) ∧ (P ∨ ¬Q) ∧ (¬P ∨ ¬Q) ∧ (¬P ∨ Q)
This formula adds one more clause, (¬P ∨ Q), to our previous example. Notably, this modification leads to an interesting property: no matter how we assign truth values, the entire formula cannot be evaluated as true. In our previous formula, the only viable possibility was setting P to true and Q to false. However, the newly added clause (¬P ∨ Q) prohibits this combination.
Thus, we have observed the fundamental difference between these two formulae. The first formula is satisfiable, while the second is not. The satisfiability problem, at its core, is precisely the challenge of determining which category a given formula falls into.
SAT solvers are capable partners that automatically resolve these satisfiability problems. While we manually determined the values for P, Q, and R in our examples, such an approach becomes untenable when dealing with a hundred variables rather than just three. Finding the correct truth assignment (which might not even exist!) would be an unreasonable task for a manual calculation. Practical SAT solvers incorporate sophisticated optimisations, enabling them to solve even large-scale problems quickly.
Does this mean that SAT solvers can handle any computational problem? Not exactly. Despite their impressive computational capabilities, they are bound by a critical limitation: SAT solvers can only process propositional logic formulae. As we saw earlier with P, Q, and R, variables that assume boolean values are called propositional variables, and formulae containing only such variables are termed propositional logic formulae. To leverage a SAT solver for real-world problems, one must first express the problems in propositional logic.
However, this approach proves highly inefficient. As readers of this article should understand, our world is not constructed only of boolean variables. We regularly work with numerics, strings, and various other data structures. Representing conditions involving numbers or strings using only propositional logic results in practically impossible or, where possible, serious complications.
Let us consider another example. Suppose we have the following formula: variables x, y, and z are numeric. This example, therefore, extends beyond the realm of propositional logic:
(x < y + 2) ∧ (x = 2z + 10) ∧ (y + z ≥ 10)
This formula is indeed satisfiable. We can verify this by assigning values such as x = 12, y = 11, and z = 1. However, rewriting this using propositional logic leads to two alarming consequences.
First, we encounter explosive growth in the number of logical clauses. Even if we restrict our variables to values between 0 and 255, expressing just the portion x < y + 2 would result in an enormously long formula:
(x = 0 ∧ y = 0) ∨ (x = 0 ∧ y = 1) ∨ … ∨ (x = 0 ∧ y = 255) ∨ … (x = 1 ∧ y = 0) ∨ (x = 1 ∧ y = 2) ∨ … ∨ (x = 1 ∧ y = 255) ∨ …
Second, and perhaps more fundamentally, SAT solvers lack any genuine understanding of numerical properties, resulting in highly inefficient reasoning processes. For instance, when encountering an expression like x < x during processing, a SAT solver cannot arrive at this conclusion while we can immediately determine its unsatisfiability. In other words, SAT solvers lack the "knowledge" of numerical relationships we naturally grasp.
How, then, can we effectively harness the powerful automation capabilities of the solvers for real-world problems such as AWS network analysis? One answer lies in the SMT solvers, which we shall explore next.
SMT Solvers
SMT solvers can be considered SAT solvers enhanced with "knowledge." In addition to the core SAT-solving component, SMT solvers incorporate theory solvers specialised for specific domains. These theory solvers handle reasoning about integers, strings, bit vectors, and other specialised domains, while the SAT solver manages purely logical operations. The selection of appropriate theory solvers represents a crucial architectural decision determining the solver's capabilities and performance characteristics.
Consider our previous formula:
(x < y + 2) ∧ (x = 2z + 10) ∧ (y + z ≥ 10)
An SMT solver equipped with integer arithmetic "knows" about inequalities, enabling it to solve such formulae directly. Similarly, an SMT solver with string theory capabilities can determine the formula is unsatisfiable:
str ++ "b" = "a" ++ str
When analysing the number of “a”s at the beginning of each side, one can readily conclude that this equation cannot hold regardless of the value assigned to str.
We shall now explore how this intelligence is integrated into SAT solving, in other words, how SAT solvers and theory solvers collaborate. While actual SMT solvers employ various sophisticated optimisations beyond the scope of the simple overview, we shall examine a simplified sketch for educational purposes. This approach will allow us to follow the flow step by step and provide a clearer understanding of the fundamental principles at work.
Let us examine the following formula:
((x > 0) ∧ (y - 2z ≥ 0)) ∧ (¬(x > 0) ∨ (y = z)) ∧ (¬(x > 0) ∨ (z ≥ x))
As discussed, SAT solvers cannot directly handle integer arithmetic or inequalities. Conversely, arithmetic theory solvers cannot handle logical operators. To resolve this dilemma, we first transform this formula, which contains arithmetic operations, into pure propositional logic.
Assume the following substitutions:
- P := x > 0
- Q := y - 2z ≥ 0
- R := y = z
- S := z >= x
With these substitutions, we can view the original formula as the following propositional formula:
(P ∨ Q) ∧ (¬P ∨ R) ∧ (¬P ∨ S)
This transformation process is known as boolean abstraction. It is crucial to note that the transformed formula contains less information than the original arithmetic-containing formula. Consequently, while the propositional formula may be satisfiable, this does not guarantee the satisfiability of the original formula. However, if the propositional formula proves unsatisfiable, we can definitively conclude that the original formula must also be unsatisfiable.
Having transformed the formula into propositional logic, we can now apply our SAT solver. The solver determines that this propositional formula is indeed satisfiable, with one possible assignment being:
P: true, Q: true, R: true, S: true
Our next step involves translating these propositional variables back to their original arithmetic expressions, resulting in the following arithmetic constraints:
- x > 0
- y - 2z ≥ 0
- y = z
- z ≥ x
Revealing boolean-abstracted conditions to their original form is called refinement. While these conditions might appear identical to our starting point, closer examination reveals a crucial difference. For instance, where the original formula allowed the condition corresponding to P to be either true or false, we have now narrowed it down to require that x > 0 be true specifically.
When we apply refinement to our constraints, they are evaluated by the theory solver. Unfortunately, this set of conditions proves unsatisfiable: no values exist for x, y, and z that could simultaneously satisfy all constraints. What does this tell us? It reveals that our assignment of "P: true, Q: true, R: true, S: true" is impossible. This insight emerges from the theory solver providing information that the SAT solver alone could not discover. This demonstrates what we mean when we say boolean abstraction reduces the info.
Having discovered that "P: true, Q: true, R: true, S: true" is impossible, we can add the negation of this condition to our propositional formula and resubmit it to the SAT solver. This process of feeding back discovered conditions is known as learning. Our new formula, with the additional term representing our learned constraint, becomes:
(P ∨ Q) ∧ (¬P ∨ R) ∧ (¬P ∨ S) ∧ ¬(P ∧ Q ∧ R ∧ S)
This time, the SAT solver discovers a new assignment:
P: false, Q: true, R: true, S: true
When we apply refinement again to convert these assignments back to arithmetic expressions, we obtain the following constraints for our theory solver. Note that since P (corresponding to x > 0) is assigned false, we now have the constraint x ≤ 0:
- x ≤ 0
- y - 2z ≥ 0
- y = z
- z ≥ x
And here comes the breakthrough: these constraints are indeed satisfiable! The SMT solver would likely report a solution such as x = 0, y = 0, and z = 0.
It's noteworthy that actual SMT solvers employ far more sophisticated mechanisms than this simplified explanation suggests. They incorporate various optimisations for enhanced performance.
For instance, we added the negation of the entire constraint set in our learning example. Still, a more efficient approach would be to identify and learn from minimal subsets of constraints that cause contradictions.
Another well-known optimisation technique is online integration, where the solver consults the theory solver for partial constraint evaluation before fully determining all propositional variable assignments. This approach differs from our example, where we waited until all assignments were complete before performing refinement.
Conclusion
In this article, we have explored the theoretical foundations underlying VPC Reachability Analyzer, which we introduced in our previous post. Specifically, we have examined the mechanisms and capabilities of SAT and SMT solvers. Through a simplified yet concrete example, we have traced the step-by-step process of how a SAT solver and a theory solver collaborate within an SMT solver, providing a detailed trace of its action flow.
In our final article of this series, we shall dive into the core engine of VPC Reachability Analyzer itself. We will explore how this powerful tool elegantly applies SMT solver technology to resolve the typical problems of AWS network troubleshooting. This last piece will combine all the concepts discussed throughout this series, demonstrating how theoretical foundations translate into practical solutions for everyday networking challenges.
Top comments (0)