70024: Added fuzzing lectures

OliverKillane · Jan 26, 2024 · 3dfd1cb · 3dfd1cb
1 parent 78ea9ca
commit 3dfd1cb
Show file tree

Hide file tree

Showing 14 changed files with 162 additions and 0 deletions.
diff --git a/70024 - Software Reliability/American Fuzzy Lop (AFL).md b/70024 - Software Reliability/American Fuzzy Lop (AFL).md
@@ -0,0 +1,38 @@
+## Definition
+A [[Mutation-Based Fuzzer|mutation-based]], [[Dumb Fuzzing|dumb]], [[Grey-Box Fuzzing|grey-box]] fuzzer.
+- General purpose and does not know input format
+```python
+files = get_user_provided_files()
+prev_behaviour = {}
+while keep_fuzzing():
+	next = files.pop()
+
+	# AFL measures branch coverage
+	behaviour = fuzz_with(next)
+	prev_benhaviour.add(behaviour)
+
+	# trim the test case to the smallest size with the 
+	# same behaviour as the original
+	fuzz_and_trim(behaviour, next)
+
+    # Mutate the file using a variety of traditional methods
+	new_files = mutate(next)
+
+	# If any have different behaviour, add to queue
+	for mutated in new_files:
+		if fuzz_with(mutated) not in prev_behaviour:
+			files.append(mutated)
+```
+
+The mutation strategies used are:
+
+| Strategy | Description |
+| ---- | ---- |
+| Walking Bit-Flips | Walk input with 1-bit stride, flipping 1-4 consecutive bits |
+| Walking byte-flips | walk input with 1-byte stride, flipping 8, 16 or 32 consecutive bits |
+| Increment/Decrement | Changed integer values in the input file (by default in range -34 to 35) |
+| Insert known integers | -1, 256, 1024, MAX_INT-1 |
+| Miscellaneous Random Tweaks | deleting or mem-setting parts of the input file |
+| Splicing | Concatenate parts of existing input |
+## [Github Repository](https://github.com/google/AFL)
+## [Creator's Website](https://lcamtuf.coredump.cx/afl/)
diff --git a/70024 - Software Reliability/Black-Box Fuzzing.md b/70024 - Software Reliability/Black-Box Fuzzing.md
@@ -0,0 +1,7 @@
+## Definition
+The [[SUT]] is executed in an unmodified form (no sanitizers or coverage added).
+### Advantages
+- Can be applied to closed source [[SUT]]
+- [[SUT]] can run at full speed (optimized binary), enabling a higher rate of fuzzing
+### Disadvantages
+- Feedback directed fuzzing can only make use of externally-visible [[SUT]] behaviour (cannot augment [[SUT]] with coverage)
diff --git a/70024 - Software Reliability/Dumb Fuzzing.md b/70024 - Software Reliability/Dumb Fuzzing.md
@@ -0,0 +1,5 @@
+## Definition
+[[Fuzzing]] that generates or mutates input without knowledge of the [[SUT]]'s input domain.
+## Examples
+- Generating random strings for a compiler, without knowledge of the grammar or semantics.
+- Sending random bytes to a server to fuzz a protocol
diff --git a/70024 - Software Reliability/Feedback-Directed Fuzzing.md b/70024 - Software Reliability/Feedback-Directed Fuzzing.md
@@ -0,0 +1,14 @@
+## Definition
+A [[Fuzzing|fuzzer]]'s input generator can observe the behaviour of the [[SUT]] to inform new inputs.
+- Currently the state of the art for fuzzing.
+
+*Interesting inputs can*
+- Log a previously unseen message
+- Cover some previously uncovered code
+- Consume more memory than usual
+- Do some previously unseen IO (e.g. system call)
+
+| Fuzzing | To Take Advantage |
+| ---- | ---- |
+| [[Generation-Based Fuzzer]] | Attempt to generate inputs with similar properties. |
+| [[Mutation-Based Fuzzer]] | Prioritise the used templates as candidates for the next inputs. |
diff --git a/70024 - Software Reliability/Fuzzing.md b/70024 - Software Reliability/Fuzzing.md
@@ -0,0 +1,43 @@
+## Definition
+Testing a system ([[SUT]]) using input that are wholly or partially randomly-generated.
+
+The goal is to get the interesting behaviour to aide in finding bugs.
+- *interesting* is application dependent, most often it means crashing the system.
+### Fuzzer Types
+- [[Generation-Based Fuzzer]]s and [[Mutation-Based Fuzzer]]s as well as combinations of both.
+- [[Dumb Fuzzing]] versus [[Smart Fuzzing]]
+- [[Feedback-Directed Fuzzing]]
+- [[Black-Box Fuzzing]] vs [[Grey-Box Fuzzing]] vs [[White-Box Fuzzing]]
+## Input Types
+| Input | For Compiler Fuzzing |
+| ---- | ---- |
+| Totally Invalid | Random invalid strings to check invalid inputs are not acceptable. |
+| Malformed Inputs | Sequences of tokens structurally correct, but invalid. |
+| Inputs with high validity | Token sequences that are in the language's grammar, but are ensured to be not semantically valid |
+| High Integrity | Well formed programs free from undefined behaviour. |
+Whenever using random inputs, it is important to consider their distribution (e.g. generating numbers, what should be distribution be like to get maximal coverage?). One way to improve this is with [[Swarm Testing]].
+### Minimal Requirements
+| Component | Description |
+| ---- | ---- |
+| [[SUT]] | The system to provide input to |
+| [[Oracle]] | To determine which behaviours are *interesting* |
+### Advantages
+- Effective in finding edge-case inputs missed by human written test suites
+- Can automatically increase coverage of a codebase
+*Note: Can be used to find exploitable defects in programs. We consider it an advantage for the developer to find them first!*
+## Early Days
+> *_We didn't call it fuzzing back in the 1950s, but it was our standard practice to test programs by inputting decks of punch cards taken from the trash. We also used decks of random number punch cards. We weren't networked in those days, so we weren't much worried about security, but our random/trash decks often turned up undesirable behavior. Every programmer I knew (and there weren't many of us back then, so I knew a great proportion of them) used the trash-deck technique._*
+>  **- [Gerald Weinberg's Secrets of Writing and Consulting: Fuzz Testing and Fuzz History (secretsofconsulting.blogspot.com)](http://secretsofconsulting.blogspot.com/2017/02/fuzz-testing-and-fuzz-history.html)
+
+### Introduction of *Fuzzing*
+In 1990 the term *fuzzing*was introduced by a paper testing linux utilities with random data.
+- The paper: [An empirical study of the reliability of UNIX utilities | Communications of the ACM](https://dl.acm.org/doi/10.1145/96267.96279)
+- Tested if random inputs could cause the utilities to crash.
+- This was a [[Generation-Based Fuzzer|generating]] [[Dumb Fuzzing|dumb]] fuzzer
+- 
+
+
+## Example
+### GLFuzz
+Generates OpenGL shader programs by mutation (transforms existing programs), but also can insert some  randomly generated code fragments (generation).
+### [[American Fuzzy Lop (AFL)]]
diff --git a/70024 - Software Reliability/Generation-Based Fuzzer.md b/70024 - Software Reliability/Generation-Based Fuzzer.md
@@ -0,0 +1,2 @@
+## Definition
+A [[fuzzer|Fuzzing]] that produces inputs from nothing (purely random).
diff --git a/70024 - Software Reliability/Grammar-Based Fuzzing.md b/70024 - Software Reliability/Grammar-Based Fuzzing.md
@@ -0,0 +1,12 @@
+## Definition
+Using a grammar for a text input to generate valid random inputs.
+
+Taking some grammar:
+```text
+Expr ::= Number | Expr Expr Op
+Op   ::= '+' | '-' | '*' | '/'
+Number ::= <32-bit signed integer>
+```
+Then randomly traversing the grammar from a start symbol:
+- Randomly pick the non-terminal symbol and jump to its production rule.
+- We need to consider the distribution of outputs (e.g. with integers, to bias towards edge values? To bias towards complexity rather than simple/shallow trees?) (See [[Swarm Testing]])
diff --git a/70024 - Software Reliability/Grey-Box Fuzzing.md b/70024 - Software Reliability/Grey-Box Fuzzing.md
@@ -0,0 +1,8 @@
+## Definition
+Instrumentation is applied to the [[SUT]] to provide information about internals (e.g. coverage, [[Compiler Sanitizers]] for behaviour).
+- Can be done at compile time, or binaries with debug & relocation information can be instrumented
+### Advantages
+- More feedback for [[Feedback-Directed Fuzzing]]
+### Disadvantages
+- Instrumentation decreases binary performance
+- Compile-time instrumentation requires the [[SUT]] source
diff --git a/70024 - Software Reliability/Mutation-Based Fuzzer.md b/70024 - Software Reliability/Mutation-Based Fuzzer.md
@@ -0,0 +1,3 @@
+## Definition
+
+A [[fuzzer|Fuzzing]] that produces inputs by modifying or combining existing inputs.
diff --git a/70024 - Software Reliability/Oracle.md b/70024 - Software Reliability/Oracle.md
@@ -0,0 +1,9 @@
+## Definition
+Used to determine which behaviours are *interesting*/flaggable when [[Fuzzing]] a system.
+
+For example:
+- Does the [[SUT]] crash
+- Is new code covered/used by the input that was not previously
+- Does a dynamic analysis (e.g. a [[Compiler Sanitizers]]) report an issue
+- Does an assertion fail (often causing a system crash)
+- Does the [[SUT]] behave correctly / output is valid?
diff --git a/70024 - Software Reliability/SUT.md b/70024 - Software Reliability/SUT.md
@@ -0,0 +1,2 @@
+## Definition
+Term for *System Under Test*
diff --git a/70024 - Software Reliability/Smart Fuzzing.md b/70024 - Software Reliability/Smart Fuzzing.md
@@ -0,0 +1,7 @@
+## Definition
+[[Fuzzing]] that uses information about the expected input format.
+- One type is [[Grammar-Based Fuzzing]]
+## Examples
+- Grammar based fuzzing (syntactically correct inputs to fuzz a semantic analyser, or to fuzz a parser for invalid rejection)
+- Protocol fuzzing
+- Model-based fuzzing
diff --git a/70024 - Software Reliability/Swarm Testing.md b/70024 - Software Reliability/Swarm Testing.md
@@ -0,0 +1,4 @@
+## Definition
+Instead of running a single [[Fuzzing]] configuration for time budget $T$, run $n$ configurations for $\cfrac{T}{n}$.
+Each configuration omits some features, and failure based on configuration can provide more details as to the bug.
+## [Paper: A Targeted Fuzzing Technique Based on Neural Networks and Particle Swarm Optimization](https://link.springer.com/chapter/10.1007/978-3-030-50399-4_36)
diff --git a/70024 - Software Reliability/White-Box Fuzzing.md b/70024 - Software Reliability/White-Box Fuzzing.md
@@ -0,0 +1,8 @@
+## Definition
+[[Program Analysis]] using constraint solving to generate inputs that provable cover different parts of the [[SUT]].
+- Called [[Symbolic Execution]], and is only loosely definable as [[Fuzzing]]
+### Advantages
+Constraint solving ensures a minimal set of inputs (fast to run, slow to generate) that can perfectly cover entire [[SUT]], including inputs that are extremely unlikely to occur under random testing.
+### Disadvantages
+- Limited scalability due to solving overhead
+- Requires specialized toolchain
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		## Definition
		A [[fuzzer\|Fuzzing]] that produces inputs from nothing (purely random).
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		## Definition

		A [[fuzzer\|Fuzzing]] that produces inputs by modifying or combining existing inputs.