Skip to content

Commit

Permalink
70024: Added fuzzing lectures
Browse files Browse the repository at this point in the history
  • Loading branch information
OliverKillane committed Jan 26, 2024
1 parent 78ea9ca commit 3dfd1cb
Show file tree
Hide file tree
Showing 14 changed files with 162 additions and 0 deletions.
38 changes: 38 additions & 0 deletions 70024 - Software Reliability/American Fuzzy Lop (AFL).md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
## Definition
A [[Mutation-Based Fuzzer|mutation-based]], [[Dumb Fuzzing|dumb]], [[Grey-Box Fuzzing|grey-box]] fuzzer.
- General purpose and does not know input format
```python
files = get_user_provided_files()
prev_behaviour = {}
while keep_fuzzing():
next = files.pop()

# AFL measures branch coverage
behaviour = fuzz_with(next)
prev_benhaviour.add(behaviour)

# trim the test case to the smallest size with the
# same behaviour as the original
fuzz_and_trim(behaviour, next)

# Mutate the file using a variety of traditional methods
new_files = mutate(next)

# If any have different behaviour, add to queue
for mutated in new_files:
if fuzz_with(mutated) not in prev_behaviour:
files.append(mutated)
```

The mutation strategies used are:

| Strategy | Description |
| ---- | ---- |
| Walking Bit-Flips | Walk input with 1-bit stride, flipping 1-4 consecutive bits |
| Walking byte-flips | walk input with 1-byte stride, flipping 8, 16 or 32 consecutive bits |
| Increment/Decrement | Changed integer values in the input file (by default in range -34 to 35) |
| Insert known integers | -1, 256, 1024, MAX_INT-1 |
| Miscellaneous Random Tweaks | deleting or mem-setting parts of the input file |
| Splicing | Concatenate parts of existing input |
## [Github Repository](https://github.com/google/AFL)
## [Creator's Website](https://lcamtuf.coredump.cx/afl/)
7 changes: 7 additions & 0 deletions 70024 - Software Reliability/Black-Box Fuzzing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
## Definition
The [[SUT]] is executed in an unmodified form (no sanitizers or coverage added).
### Advantages
- Can be applied to closed source [[SUT]]
- [[SUT]] can run at full speed (optimized binary), enabling a higher rate of fuzzing
### Disadvantages
- Feedback directed fuzzing can only make use of externally-visible [[SUT]] behaviour (cannot augment [[SUT]] with coverage)
5 changes: 5 additions & 0 deletions 70024 - Software Reliability/Dumb Fuzzing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
## Definition
[[Fuzzing]] that generates or mutates input without knowledge of the [[SUT]]'s input domain.
## Examples
- Generating random strings for a compiler, without knowledge of the grammar or semantics.
- Sending random bytes to a server to fuzz a protocol
14 changes: 14 additions & 0 deletions 70024 - Software Reliability/Feedback-Directed Fuzzing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
## Definition
A [[Fuzzing|fuzzer]]'s input generator can observe the behaviour of the [[SUT]] to inform new inputs.
- Currently the state of the art for fuzzing.

*Interesting inputs can*
- Log a previously unseen message
- Cover some previously uncovered code
- Consume more memory than usual
- Do some previously unseen IO (e.g. system call)

| Fuzzing | To Take Advantage |
| ---- | ---- |
| [[Generation-Based Fuzzer]] | Attempt to generate inputs with similar properties. |
| [[Mutation-Based Fuzzer]] | Prioritise the used templates as candidates for the next inputs. |
43 changes: 43 additions & 0 deletions 70024 - Software Reliability/Fuzzing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
## Definition
Testing a system ([[SUT]]) using input that are wholly or partially randomly-generated.

The goal is to get the interesting behaviour to aide in finding bugs.
- *interesting* is application dependent, most often it means crashing the system.
### Fuzzer Types
- [[Generation-Based Fuzzer]]s and [[Mutation-Based Fuzzer]]s as well as combinations of both.
- [[Dumb Fuzzing]] versus [[Smart Fuzzing]]
- [[Feedback-Directed Fuzzing]]
- [[Black-Box Fuzzing]] vs [[Grey-Box Fuzzing]] vs [[White-Box Fuzzing]]
## Input Types
| Input | For Compiler Fuzzing |
| ---- | ---- |
| Totally Invalid | Random invalid strings to check invalid inputs are not acceptable. |
| Malformed Inputs | Sequences of tokens structurally correct, but invalid. |
| Inputs with high validity | Token sequences that are in the language's grammar, but are ensured to be not semantically valid |
| High Integrity | Well formed programs free from undefined behaviour. |
Whenever using random inputs, it is important to consider their distribution (e.g. generating numbers, what should be distribution be like to get maximal coverage?). One way to improve this is with [[Swarm Testing]].
### Minimal Requirements
| Component | Description |
| ---- | ---- |
| [[SUT]] | The system to provide input to |
| [[Oracle]] | To determine which behaviours are *interesting* |
### Advantages
- Effective in finding edge-case inputs missed by human written test suites
- Can automatically increase coverage of a codebase
*Note: Can be used to find exploitable defects in programs. We consider it an advantage for the developer to find them first!*
## Early Days
> *_We didn't call it fuzzing back in the 1950s, but it was our standard practice to test programs by inputting decks of punch cards taken from the trash. We also used decks of random number punch cards. We weren't networked in those days, so we weren't much worried about security, but our random/trash decks often turned up undesirable behavior. Every programmer I knew (and there weren't many of us back then, so I knew a great proportion of them) used the trash-deck technique._*
> **- [Gerald Weinberg's Secrets of Writing and Consulting: Fuzz Testing and Fuzz History (secretsofconsulting.blogspot.com)](http://secretsofconsulting.blogspot.com/2017/02/fuzz-testing-and-fuzz-history.html)
### Introduction of *Fuzzing*
In 1990 the term *fuzzing*was introduced by a paper testing linux utilities with random data.
- The paper: [An empirical study of the reliability of UNIX utilities | Communications of the ACM](https://dl.acm.org/doi/10.1145/96267.96279)
- Tested if random inputs could cause the utilities to crash.
- This was a [[Generation-Based Fuzzer|generating]] [[Dumb Fuzzing|dumb]] fuzzer
-


## Example
### GLFuzz
Generates OpenGL shader programs by mutation (transforms existing programs), but also can insert some randomly generated code fragments (generation).
### [[American Fuzzy Lop (AFL)]]
2 changes: 2 additions & 0 deletions 70024 - Software Reliability/Generation-Based Fuzzer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
## Definition
A [[fuzzer|Fuzzing]] that produces inputs from nothing (purely random).
12 changes: 12 additions & 0 deletions 70024 - Software Reliability/Grammar-Based Fuzzing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
## Definition
Using a grammar for a text input to generate valid random inputs.

Taking some grammar:
```text
Expr ::= Number | Expr Expr Op
Op ::= '+' | '-' | '*' | '/'
Number ::= <32-bit signed integer>
```
Then randomly traversing the grammar from a start symbol:
- Randomly pick the non-terminal symbol and jump to its production rule.
- We need to consider the distribution of outputs (e.g. with integers, to bias towards edge values? To bias towards complexity rather than simple/shallow trees?) (See [[Swarm Testing]])
8 changes: 8 additions & 0 deletions 70024 - Software Reliability/Grey-Box Fuzzing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
## Definition
Instrumentation is applied to the [[SUT]] to provide information about internals (e.g. coverage, [[Compiler Sanitizers]] for behaviour).
- Can be done at compile time, or binaries with debug & relocation information can be instrumented
### Advantages
- More feedback for [[Feedback-Directed Fuzzing]]
### Disadvantages
- Instrumentation decreases binary performance
- Compile-time instrumentation requires the [[SUT]] source
3 changes: 3 additions & 0 deletions 70024 - Software Reliability/Mutation-Based Fuzzer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Definition

A [[fuzzer|Fuzzing]] that produces inputs by modifying or combining existing inputs.
9 changes: 9 additions & 0 deletions 70024 - Software Reliability/Oracle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
## Definition
Used to determine which behaviours are *interesting*/flaggable when [[Fuzzing]] a system.

For example:
- Does the [[SUT]] crash
- Is new code covered/used by the input that was not previously
- Does a dynamic analysis (e.g. a [[Compiler Sanitizers]]) report an issue
- Does an assertion fail (often causing a system crash)
- Does the [[SUT]] behave correctly / output is valid?
2 changes: 2 additions & 0 deletions 70024 - Software Reliability/SUT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
## Definition
Term for *System Under Test*
7 changes: 7 additions & 0 deletions 70024 - Software Reliability/Smart Fuzzing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
## Definition
[[Fuzzing]] that uses information about the expected input format.
- One type is [[Grammar-Based Fuzzing]]
## Examples
- Grammar based fuzzing (syntactically correct inputs to fuzz a semantic analyser, or to fuzz a parser for invalid rejection)
- Protocol fuzzing
- Model-based fuzzing
4 changes: 4 additions & 0 deletions 70024 - Software Reliability/Swarm Testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
## Definition
Instead of running a single [[Fuzzing]] configuration for time budget $T$, run $n$ configurations for $\cfrac{T}{n}$.
Each configuration omits some features, and failure based on configuration can provide more details as to the bug.
## [Paper: A Targeted Fuzzing Technique Based on Neural Networks and Particle Swarm Optimization](https://link.springer.com/chapter/10.1007/978-3-030-50399-4_36)
8 changes: 8 additions & 0 deletions 70024 - Software Reliability/White-Box Fuzzing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
## Definition
[[Program Analysis]] using constraint solving to generate inputs that provable cover different parts of the [[SUT]].
- Called [[Symbolic Execution]], and is only loosely definable as [[Fuzzing]]
### Advantages
Constraint solving ensures a minimal set of inputs (fast to run, slow to generate) that can perfectly cover entire [[SUT]], including inputs that are extremely unlikely to occur under random testing.
### Disadvantages
- Limited scalability due to solving overhead
- Requires specialized toolchain

0 comments on commit 3dfd1cb

Please sign in to comment.