-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
70024: Added cmpiler sanitizer notes
- Loading branch information
1 parent
f418ed9
commit 61d22c3
Showing
11 changed files
with
209 additions
and
35 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,64 @@ | ||
## Definition | ||
[[TODO]] | ||
Instrument code at compile time (static instrumentation) to analyse at runtime. | ||
- Some cannot be combined (e.g. MSan and ASan) | ||
- Much faster than dynamic instrumentation of binaries (e.g. [[Valgrind]]) | ||
## Clang/GCC Sanitizers | ||
### Address Sanitizer | ||
```bash | ||
-fsanitize=address | ||
``` | ||
- Detects out of bounds accesses, use after free, and memory leaks. | ||
- Typically $\approx2\times$ slowdown. | ||
- Adds red-zones around all allocations (marked as poisoned), and after frees. If poisoned addresses are accessed, then throws an error, backtrace and printout of accessed poisoned buffer. | ||
|
||
#### Poisoning | ||
Poisoning is done with their own version of malloc on the heap, and by adding to stack frames (extra pushes/pops added at compile time). BSS has redzones added at compile time also. | ||
|
||
When memory is freed it is put in *quarantine* (a limited size FIFO), when accessed. | ||
- Limited size $\to$ large distance means access not detected. | ||
#### Shadow Memory | ||
(Also used in [[Valgrind]]), each byte is associated with some memory containing metadata. | ||
- On each memory access, check its shadow exists (create if not) | ||
Each 8 bytes (assumes 8 byte alignment) encoded with 9 states held in a single byte: | ||
|
||
| State | Meaning | | ||
| ---- | ---- | | ||
| $=0$ | All bytes addressable | | ||
| $<0$ | No bytes Addressable | | ||
| $1 \leq k \leq 7$ | $k$ bytes addressable | | ||
```rust | ||
unsafe fn to_shadow(addr: *u8) -> *u8 { | ||
SHADOWN_REGION_OFFSET + (addr / 8) | ||
} | ||
``` | ||
#### Results | ||
No false positives, some false negatives (overflow bypasses redzones, large time distance between free and use after free). | ||
### Memory Sanitizer | ||
```bash | ||
-fsanitize=memory | ||
``` | ||
Detects uninitialized memory accesses. Typically $\approx3\times$ slowdown. | ||
#### Shadow Memory | ||
Each byte is mapped to a single *validity bit*, declaring if it is initialised. | ||
#### Results | ||
Not all reads uninitialized are bad. | ||
```c | ||
int a, b; | ||
a = b; // value ignored, so no need to report | ||
a = 1; | ||
``` | ||
|
||
Hence it is restricted to: | ||
- reads in branch conditions | ||
- reads for determining memory accesses | ||
- reads passed to system calls | ||
It's reports have: | ||
- false positives when interacting with uninstrumented code | ||
- allows for annotations to declare safe objects in source, to avoid reporting | ||
|
||
## Address Sanitizer | ||
## Undefined Behaviour Sanitizer | ||
Detects a set of undefined behaviours. | ||
[UndefinedBehaviorSanitizer — Clang 19.0.0git documentation (llvm.org)](https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html) | ||
|
||
## Thread Sanitizer | ||
Detects data races and deadlocks. Typically $\approx5-15\times$ slowdown. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,72 @@ | ||
## Aims | ||
## Summary | ||
|
||
| Aim | Why | | ||
| ---- | ---- | | ||
| Bit Level Modelling | System level languages allow this granularity of access, and define behaviours at this level. | | ||
| Performance | Real programs generate many complex constraints, need to analyse these programs quickly enough to be sueful. | | ||
[[SAT]] solvers could be used to implement this, however most applications generate constraints at a higher level than just boolean formulae (e.g. integers, integer comparison). | ||
|
||
Hence we use [[Satisfiability Modulo Theories]] and [[BitVectors]]. | ||
- Each block of memory is represented by an array of 8-bit [[BitVectors]] to allow constraints with bit-level accuracy. | ||
- Operations on [[BitVectors]] include concatenation, extraction, bitwise operations and arithmetic. | ||
- Eagre translation is used to simplify formulas at the theory level (e.g. $x - x \equiv 0$) | ||
- Arithmetic can be encoded as a formula representing a circuit (e.g. a ripple-carry adder) | ||
## Arrays | ||
|
||
```rust | ||
/// Read a value | ||
fn read(a: &Array, i: Index) -> Value; | ||
|
||
/// return a new array that is a copy of a, with v at index i | ||
fn write(a: &Array, i: Index, v: Value) -> Array; | ||
``` | ||
$$\begin{matrix*}[l r l l] | ||
\forall a, i, j. & i = j & \Rightarrow read(a,i) = read(a,j) & \text{Array Congruence} \\ | ||
\forall a, i, j, v. & i = j & \Rightarrow read(write(a, i, v), j) = v & \text{Read-Over-Write 1}\\ | ||
\forall a, i, j, v. & i \neq j & \Rightarrow read(write(a, i, v), j) = read(a, j) & \text{Read-Over-Write 2} \\ | ||
\end{matrix*}$$ | ||
### Eliminating Arrays | ||
#### Eliminate Writes | ||
$$\begin{matrix*} | ||
read(write(a, v, i), j) \Leftrightarrow ite(i = j, v, read(a, j)) | ||
\end{matrix*}$$ | ||
The write can be eliminated, instead just check if the $read$ reads from the index of the write, in which case just return the new value. | ||
#### Eliminate Reads | ||
Can replace each syntactically-unique $read$ with a fresh variable. For each array access with the same indices, we can imply the same variable is used (pairwise compare all indexes). | ||
- Expensive as it expands formulas by $\cfrac{n(n-1)}{2}$ terms (where $n$ are the syntactically different indexes) to compare all indexes used. | ||
|
||
For example: | ||
$$\begin{split} | ||
& (a[i_1] = e_1) \land (a[i_2] = e_2) \land (a[i_3] = e_3) \land (i_1 + i_2 + i_3 = 6) \\ | ||
\to \ & (v_1 = e_1) \land (v_2 = e_2) \land (v_3 = e_3) \land (i_1 + i_2 + i_3 = 6) \\ | ||
\to \ & (i_1 = i_2 \Rightarrow v_1 = v_2) \land (i_1 = i_3 \Rightarrow v_1 = v_3) \land (i_2 = i_3 \Rightarrow v_2 = v_3)\\ | ||
\end{split}$$ | ||
### Array-Based Refinement | ||
When we cannot eliminate the array, we can | ||
[[TODO]] | ||
|
||
## Performance | ||
[[Satisfiability Modulo Theories]] solving is NP-Complete and hence expensive, it is also invoked at every branch. | ||
- [[70024 - Software Reliability/Constraint Solving#Eliminating Arrays|Elimination]] of constraints. | ||
- Caching constraints and the values satisfying them. | ||
- Constraint simplification | ||
- Summarizing loops | ||
### Use Different SMT/SAT Solvers | ||
Different solvers perform differently on different formulas, so we can invoke multiple and see which is the fastest. | ||
### Eliminating Irrelevant Constraints | ||
Many conditions only rely on a small number of variables. We only need to include relevant constraints to a branch condition. | ||
### Caching Solutions | ||
Lots of similar constraint sets for branches (e.g. in loops, the same constraints used many times). | ||
### Array Transformation | ||
#### Index-Based | ||
```rust | ||
b[k] > 0 | ||
``` | ||
We can add in constraints on the indices of `b` that are in some value range. | ||
#### Value-Based | ||
```rust | ||
b[k] > x | ||
``` | ||
we can turn this into: | ||
$$ite(0 \leq k \leq \dots, ite( k = \dots, ite(\dots, ))) > x$$ | ||
Effectively converting the array access into a set of conditionals in an expression. |
30 changes: 30 additions & 0 deletions
30
70024 - Software Reliability/Memory Safety Violations Based Attacks.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
## Attacks | ||
### Code Corruption | ||
An attempt to modify code in memory. | ||
- For elf binaries pages containing code are marked as readonly, and the heap and stack marked as non-executable. But this does not prevent modification of interpreted bytecodes stored on the heap. | ||
- Systems using code generation (e.g. JIT) need pages that are readable, writable and executable, so cannot use the same protections. | ||
### Control Flow Hijack | ||
Corrupt pointers used for control flow (e.g. return address, function pointers, vtables) to direct control flow. | ||
### Non-Control-Data Attacks | ||
Corrupt security related data (not code). | ||
```c | ||
void authenticate(int input) { | ||
char auth = 0; | ||
char buf[128] = {0}; | ||
// depending on layout of local vars on stack, an overflow to | ||
// buf can overwrite auth | ||
buf[input] = 1; | ||
if (auth) { | ||
set_privileged_mode(); | ||
} | ||
} | ||
``` | ||
This also includes dangling pointers left after freeing, that can be used to access data allocated in the same location later, and potentially to corrupt. | ||
### Leak Confidential Memory | ||
Out of bounds reads on buffers. | ||
[Heartbleed - Wikipedia](https://en.wikipedia.org/wiki/Heartbleed) | ||
## Defences | ||
### Stack Canaries | ||
Compiler adds canaries (a magic value) into stack frames, and verifies integrity at start and end of call to ensure no corruption has occurred. If the canary is valid, it is likely no stack corruption occurred. | ||
Common values include `\0` / null (e.g. to prevent null terminated string copy/access from overflowing). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
5 changes: 5 additions & 0 deletions
5
70024 - Software Reliability/Satisfiability Modulo Theories.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
## Definition | ||
A generalisation of [[SAT]] theory to more complex formulas including: | ||
- real numbers | ||
- integers | ||
- data structures (arrays, lists) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,4 +7,4 @@ | |
## Automated | ||
### [[Fuzzing]] | ||
### [[Static Analysis]] | ||
### [[Symbolic Execution 1]] | ||
### [[Symbolic Execution]] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,36 @@ | ||
[[TODO]] | ||
## Definition | ||
By running a program on symbolic input (representing any value, can be logically constrained), every path through a program is analysed by constraining symbolic values to symbols representing the sets of possible values on the paths. | ||
- Program instructions act on symbolic values. | ||
- At conditionals, fork the symbolic values and constrain by the branch condition. | ||
- On termination generate a test case by solving the constraints on the path. | ||
### Advantages | ||
| Advantage | Description | | ||
| ---- | ---- | | ||
| Automatic | Requires no input test cases, initial seeds, configuration (like [[Fuzzing]]/[[Swarm Testing]]). | | ||
| Systematic | Can reach all possible paths in the program, and reason about all possible values on these paths. | | ||
| Deep Bugs | Cam reason about all values, hence including extremely rare edge case values & memory layouts. | | ||
| [[Functional Bugs]] | Can reason about logical statements given appropriate oracles. e.g. checking for crashes, given assert statements cause crashes. | | ||
| Test Cases | Generates test cases for developers to allow them to reproduce and debug. | | ||
### Disadvantages | ||
| Disadvantage | Description | | ||
| ---- | ---- | | ||
| Complex Toolchain | Requires the source to be available, and compiled to some form the symbolic execution tool can interpret. | | ||
| Constrain Solving | Constraint solving is expensive. | | ||
## Mixed Concrete & Symbolic Execution | ||
Many values are concrete (e.g. constants in the code, or concrete inputs set by the user). | ||
- Only operations that include symbolic values need to be symbolically executed, the rest can be executed as normal. | ||
- Allows interaction with outside environment (e.g. operating system, un-instrumented libraries, etc.) | ||
- | ||
## Challenges | ||
### [[Path Explosion]] | ||
### [[70024 - Software Reliability/Constraint Solving|Constraint Solving]] | ||
### [[Environment Modelling]] | ||
|
||
## Applications | ||
- Automatically generating high coverage test suites. | ||
- High coverage regression testing (test cases for all paths, generate test suite, check new versions) | ||
- [[Differential Testing]] between clear and highly correct implementations, and optimised implementations. (can check behaviour is identical) | ||
## Examples | ||
KLEE, CREST, SPF, FuzzBall | ||
### [[KLEE]] | ||
### [[EXE]] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
## Definition | ||
Dynamic instrumentation of binaries to detect memory related [[Generic Bugs]]. |