From 3dfd1cb9db9004d94241f64dfd17f725ff912b8c Mon Sep 17 00:00:00 2001 From: Oliver Killane Date: Fri, 26 Jan 2024 13:39:52 +0000 Subject: [PATCH] 70024: Added fuzzing lectures --- .../American Fuzzy Lop (AFL).md | 38 ++++++++++++++++ .../Black-Box Fuzzing.md | 7 +++ 70024 - Software Reliability/Dumb Fuzzing.md | 5 +++ .../Feedback-Directed Fuzzing.md | 14 ++++++ 70024 - Software Reliability/Fuzzing.md | 43 +++++++++++++++++++ .../Generation-Based Fuzzer.md | 2 + .../Grammar-Based Fuzzing.md | 12 ++++++ .../Grey-Box Fuzzing.md | 8 ++++ .../Mutation-Based Fuzzer.md | 3 ++ 70024 - Software Reliability/Oracle.md | 9 ++++ 70024 - Software Reliability/SUT.md | 2 + 70024 - Software Reliability/Smart Fuzzing.md | 7 +++ 70024 - Software Reliability/Swarm Testing.md | 4 ++ .../White-Box Fuzzing.md | 8 ++++ 14 files changed, 162 insertions(+) create mode 100644 70024 - Software Reliability/American Fuzzy Lop (AFL).md create mode 100644 70024 - Software Reliability/Black-Box Fuzzing.md create mode 100644 70024 - Software Reliability/Dumb Fuzzing.md create mode 100644 70024 - Software Reliability/Feedback-Directed Fuzzing.md create mode 100644 70024 - Software Reliability/Fuzzing.md create mode 100644 70024 - Software Reliability/Generation-Based Fuzzer.md create mode 100644 70024 - Software Reliability/Grammar-Based Fuzzing.md create mode 100644 70024 - Software Reliability/Grey-Box Fuzzing.md create mode 100644 70024 - Software Reliability/Mutation-Based Fuzzer.md create mode 100644 70024 - Software Reliability/Oracle.md create mode 100644 70024 - Software Reliability/SUT.md create mode 100644 70024 - Software Reliability/Smart Fuzzing.md create mode 100644 70024 - Software Reliability/Swarm Testing.md create mode 100644 70024 - Software Reliability/White-Box Fuzzing.md diff --git a/70024 - Software Reliability/American Fuzzy Lop (AFL).md b/70024 - Software Reliability/American Fuzzy Lop (AFL).md new file mode 100644 index 0000000..bfe4869 --- /dev/null +++ b/70024 - Software Reliability/American Fuzzy Lop (AFL).md @@ -0,0 +1,38 @@ +## Definition +A [[Mutation-Based Fuzzer|mutation-based]], [[Dumb Fuzzing|dumb]], [[Grey-Box Fuzzing|grey-box]] fuzzer. +- General purpose and does not know input format +```python +files = get_user_provided_files() +prev_behaviour = {} +while keep_fuzzing(): + next = files.pop() + + # AFL measures branch coverage + behaviour = fuzz_with(next) + prev_benhaviour.add(behaviour) + + # trim the test case to the smallest size with the + # same behaviour as the original + fuzz_and_trim(behaviour, next) + + # Mutate the file using a variety of traditional methods + new_files = mutate(next) + + # If any have different behaviour, add to queue + for mutated in new_files: + if fuzz_with(mutated) not in prev_behaviour: + files.append(mutated) +``` + +The mutation strategies used are: + +| Strategy | Description | +| ---- | ---- | +| Walking Bit-Flips | Walk input with 1-bit stride, flipping 1-4 consecutive bits | +| Walking byte-flips | walk input with 1-byte stride, flipping 8, 16 or 32 consecutive bits | +| Increment/Decrement | Changed integer values in the input file (by default in range -34 to 35) | +| Insert known integers | -1, 256, 1024, MAX_INT-1 | +| Miscellaneous Random Tweaks | deleting or mem-setting parts of the input file | +| Splicing | Concatenate parts of existing input | +## [Github Repository](https://github.com/google/AFL) +## [Creator's Website](https://lcamtuf.coredump.cx/afl/) diff --git a/70024 - Software Reliability/Black-Box Fuzzing.md b/70024 - Software Reliability/Black-Box Fuzzing.md new file mode 100644 index 0000000..4418441 --- /dev/null +++ b/70024 - Software Reliability/Black-Box Fuzzing.md @@ -0,0 +1,7 @@ +## Definition +The [[SUT]] is executed in an unmodified form (no sanitizers or coverage added). +### Advantages +- Can be applied to closed source [[SUT]] +- [[SUT]] can run at full speed (optimized binary), enabling a higher rate of fuzzing +### Disadvantages +- Feedback directed fuzzing can only make use of externally-visible [[SUT]] behaviour (cannot augment [[SUT]] with coverage) \ No newline at end of file diff --git a/70024 - Software Reliability/Dumb Fuzzing.md b/70024 - Software Reliability/Dumb Fuzzing.md new file mode 100644 index 0000000..bfa4299 --- /dev/null +++ b/70024 - Software Reliability/Dumb Fuzzing.md @@ -0,0 +1,5 @@ +## Definition +[[Fuzzing]] that generates or mutates input without knowledge of the [[SUT]]'s input domain. +## Examples +- Generating random strings for a compiler, without knowledge of the grammar or semantics. +- Sending random bytes to a server to fuzz a protocol \ No newline at end of file diff --git a/70024 - Software Reliability/Feedback-Directed Fuzzing.md b/70024 - Software Reliability/Feedback-Directed Fuzzing.md new file mode 100644 index 0000000..4c80bdb --- /dev/null +++ b/70024 - Software Reliability/Feedback-Directed Fuzzing.md @@ -0,0 +1,14 @@ +## Definition +A [[Fuzzing|fuzzer]]'s input generator can observe the behaviour of the [[SUT]] to inform new inputs. +- Currently the state of the art for fuzzing. + +*Interesting inputs can* +- Log a previously unseen message +- Cover some previously uncovered code +- Consume more memory than usual +- Do some previously unseen IO (e.g. system call) + +| Fuzzing | To Take Advantage | +| ---- | ---- | +| [[Generation-Based Fuzzer]] | Attempt to generate inputs with similar properties. | +| [[Mutation-Based Fuzzer]] | Prioritise the used templates as candidates for the next inputs. | diff --git a/70024 - Software Reliability/Fuzzing.md b/70024 - Software Reliability/Fuzzing.md new file mode 100644 index 0000000..3959c3c --- /dev/null +++ b/70024 - Software Reliability/Fuzzing.md @@ -0,0 +1,43 @@ +## Definition +Testing a system ([[SUT]]) using input that are wholly or partially randomly-generated. + +The goal is to get the interesting behaviour to aide in finding bugs. +- *interesting* is application dependent, most often it means crashing the system. +### Fuzzer Types +- [[Generation-Based Fuzzer]]s and [[Mutation-Based Fuzzer]]s as well as combinations of both. +- [[Dumb Fuzzing]] versus [[Smart Fuzzing]] +- [[Feedback-Directed Fuzzing]] +- [[Black-Box Fuzzing]] vs [[Grey-Box Fuzzing]] vs [[White-Box Fuzzing]] +## Input Types +| Input | For Compiler Fuzzing | +| ---- | ---- | +| Totally Invalid | Random invalid strings to check invalid inputs are not acceptable. | +| Malformed Inputs | Sequences of tokens structurally correct, but invalid. | +| Inputs with high validity | Token sequences that are in the language's grammar, but are ensured to be not semantically valid | +| High Integrity | Well formed programs free from undefined behaviour. | +Whenever using random inputs, it is important to consider their distribution (e.g. generating numbers, what should be distribution be like to get maximal coverage?). One way to improve this is with [[Swarm Testing]]. +### Minimal Requirements +| Component | Description | +| ---- | ---- | +| [[SUT]] | The system to provide input to | +| [[Oracle]] | To determine which behaviours are *interesting* | +### Advantages +- Effective in finding edge-case inputs missed by human written test suites +- Can automatically increase coverage of a codebase +*Note: Can be used to find exploitable defects in programs. We consider it an advantage for the developer to find them first!* +## Early Days +> *_We didn't call it fuzzing back in the 1950s, but it was our standard practice to test programs by inputting decks of punch cards taken from the trash. We also used decks of random number punch cards. We weren't networked in those days, so we weren't much worried about security, but our random/trash decks often turned up undesirable behavior. Every programmer I knew (and there weren't many of us back then, so I knew a great proportion of them) used the trash-deck technique._* +> **- [Gerald Weinberg's Secrets of Writing and Consulting: Fuzz Testing and Fuzz History (secretsofconsulting.blogspot.com)](http://secretsofconsulting.blogspot.com/2017/02/fuzz-testing-and-fuzz-history.html) + +### Introduction of *Fuzzing* +In 1990 the term *fuzzing*was introduced by a paper testing linux utilities with random data. +- The paper: [An empirical study of the reliability of UNIX utilities | Communications of the ACM](https://dl.acm.org/doi/10.1145/96267.96279) +- Tested if random inputs could cause the utilities to crash. +- This was a [[Generation-Based Fuzzer|generating]] [[Dumb Fuzzing|dumb]] fuzzer +- + + +## Example +### GLFuzz +Generates OpenGL shader programs by mutation (transforms existing programs), but also can insert some randomly generated code fragments (generation). +### [[American Fuzzy Lop (AFL)]] \ No newline at end of file diff --git a/70024 - Software Reliability/Generation-Based Fuzzer.md b/70024 - Software Reliability/Generation-Based Fuzzer.md new file mode 100644 index 0000000..5a3c7da --- /dev/null +++ b/70024 - Software Reliability/Generation-Based Fuzzer.md @@ -0,0 +1,2 @@ +## Definition +A [[fuzzer|Fuzzing]] that produces inputs from nothing (purely random). diff --git a/70024 - Software Reliability/Grammar-Based Fuzzing.md b/70024 - Software Reliability/Grammar-Based Fuzzing.md new file mode 100644 index 0000000..e9ccd4c --- /dev/null +++ b/70024 - Software Reliability/Grammar-Based Fuzzing.md @@ -0,0 +1,12 @@ +## Definition +Using a grammar for a text input to generate valid random inputs. + +Taking some grammar: +```text +Expr ::= Number | Expr Expr Op +Op ::= '+' | '-' | '*' | '/' +Number ::= <32-bit signed integer> +``` +Then randomly traversing the grammar from a start symbol: +- Randomly pick the non-terminal symbol and jump to its production rule. +- We need to consider the distribution of outputs (e.g. with integers, to bias towards edge values? To bias towards complexity rather than simple/shallow trees?) (See [[Swarm Testing]]) diff --git a/70024 - Software Reliability/Grey-Box Fuzzing.md b/70024 - Software Reliability/Grey-Box Fuzzing.md new file mode 100644 index 0000000..520d7e9 --- /dev/null +++ b/70024 - Software Reliability/Grey-Box Fuzzing.md @@ -0,0 +1,8 @@ +## Definition +Instrumentation is applied to the [[SUT]] to provide information about internals (e.g. coverage, [[Compiler Sanitizers]] for behaviour). +- Can be done at compile time, or binaries with debug & relocation information can be instrumented +### Advantages +- More feedback for [[Feedback-Directed Fuzzing]] +### Disadvantages +- Instrumentation decreases binary performance +- Compile-time instrumentation requires the [[SUT]] source \ No newline at end of file diff --git a/70024 - Software Reliability/Mutation-Based Fuzzer.md b/70024 - Software Reliability/Mutation-Based Fuzzer.md new file mode 100644 index 0000000..48c63ff --- /dev/null +++ b/70024 - Software Reliability/Mutation-Based Fuzzer.md @@ -0,0 +1,3 @@ +## Definition + +A [[fuzzer|Fuzzing]] that produces inputs by modifying or combining existing inputs. diff --git a/70024 - Software Reliability/Oracle.md b/70024 - Software Reliability/Oracle.md new file mode 100644 index 0000000..8faf48d --- /dev/null +++ b/70024 - Software Reliability/Oracle.md @@ -0,0 +1,9 @@ +## Definition +Used to determine which behaviours are *interesting*/flaggable when [[Fuzzing]] a system. + +For example: +- Does the [[SUT]] crash +- Is new code covered/used by the input that was not previously +- Does a dynamic analysis (e.g. a [[Compiler Sanitizers]]) report an issue +- Does an assertion fail (often causing a system crash) +- Does the [[SUT]] behave correctly / output is valid? \ No newline at end of file diff --git a/70024 - Software Reliability/SUT.md b/70024 - Software Reliability/SUT.md new file mode 100644 index 0000000..5f59384 --- /dev/null +++ b/70024 - Software Reliability/SUT.md @@ -0,0 +1,2 @@ +## Definition +Term for *System Under Test* \ No newline at end of file diff --git a/70024 - Software Reliability/Smart Fuzzing.md b/70024 - Software Reliability/Smart Fuzzing.md new file mode 100644 index 0000000..a92e9e0 --- /dev/null +++ b/70024 - Software Reliability/Smart Fuzzing.md @@ -0,0 +1,7 @@ +## Definition +[[Fuzzing]] that uses information about the expected input format. +- One type is [[Grammar-Based Fuzzing]] +## Examples +- Grammar based fuzzing (syntactically correct inputs to fuzz a semantic analyser, or to fuzz a parser for invalid rejection) +- Protocol fuzzing +- Model-based fuzzing \ No newline at end of file diff --git a/70024 - Software Reliability/Swarm Testing.md b/70024 - Software Reliability/Swarm Testing.md new file mode 100644 index 0000000..2cf8aca --- /dev/null +++ b/70024 - Software Reliability/Swarm Testing.md @@ -0,0 +1,4 @@ +## Definition +Instead of running a single [[Fuzzing]] configuration for time budget $T$, run $n$ configurations for $\cfrac{T}{n}$. +Each configuration omits some features, and failure based on configuration can provide more details as to the bug. +## [Paper: A Targeted Fuzzing Technique Based on Neural Networks and Particle Swarm Optimization](https://link.springer.com/chapter/10.1007/978-3-030-50399-4_36) \ No newline at end of file diff --git a/70024 - Software Reliability/White-Box Fuzzing.md b/70024 - Software Reliability/White-Box Fuzzing.md new file mode 100644 index 0000000..941a76f --- /dev/null +++ b/70024 - Software Reliability/White-Box Fuzzing.md @@ -0,0 +1,8 @@ +## Definition +[[Program Analysis]] using constraint solving to generate inputs that provable cover different parts of the [[SUT]]. +- Called [[Symbolic Execution]], and is only loosely definable as [[Fuzzing]] +### Advantages +Constraint solving ensures a minimal set of inputs (fast to run, slow to generate) that can perfectly cover entire [[SUT]], including inputs that are extremely unlikely to occur under random testing. +### Disadvantages +- Limited scalability due to solving overhead +- Requires specialized toolchain \ No newline at end of file