From c0e2f0421b093b196e74d212d6b40dc85024ba4d Mon Sep 17 00:00:00 2001 From: Huanyi Chen Date: Sun, 14 Jan 2024 22:44:26 -0500 Subject: [PATCH] Prepare L07 & L08 and part of L09 flipped notes --- lectures/flipped/L07.md | 68 ++++++++++++++++++++++++++--------------- lectures/flipped/L08.md | 18 ++++++++--- lectures/flipped/L09.md | 26 +++++++++++----- 3 files changed, 76 insertions(+), 36 deletions(-) diff --git a/lectures/flipped/L07.md b/lectures/flipped/L07.md index ab3fd0f..63aaa35 100644 --- a/lectures/flipped/L07.md +++ b/lectures/flipped/L07.md @@ -1,10 +1,14 @@ # Lecture 07: CPU Hardware, Branch Prediction -## CPU info +## Roadmap -```sh -cat /proc/cpuinfo -``` +We will talk about branch prediction and try some experiments. + +We will also talk about some of the interesting attacks using it. + +## Mini-lecture + +Mainly the "How does branch prediction work" section in the lecture note ## Likely and Unlikely @@ -54,7 +58,7 @@ hyperfine --warmup 3 'cargo run --release' ### Cache attack -Meltdown +#### Meltdown ```C // toy example @@ -66,40 +70,56 @@ access(probe_array[data * 4096]); // the time to access other probe_array[j], then data = i/4096; ``` +#### Spectre + +```C +// x < array1_size is mostly true to train the branch predictor +// then attacker will set x to an out of bound value +// thus, array1[x] can be then infered +if (x < array1_size) { + y = array2[array1[x] * 4096]; +} +``` + ### Cache attack, plam version Spectre -Idea: use cache timing information to figure out what a given bit (supposedly inaccessible) is. +Idea: use cache timing information to figure out what a given bit (supposedly +inaccessible) is. -Let's build up the attack. +Let's build up the attack. -Step 1: observe that loading from cache is much faster than loading from memory. Act that out by asking for a value; the value either comes from on top of the desk (cache) or inside the cabinet (memory). +Step 1: observe that loading from cache is much faster than loading from memory. +Act that out by asking for a value; the value either comes from on top of the +desk (cache) or inside the cabinet (memory). -Step 2: ok, now there is a value V that you want to know what it is. If you ask, the CPU won't tell you. But before it checks that it's not supposed to tell you V, it will start doing the array read. That's supposed to be OK because it's supposed to rollback before it tells you. +Step 2: ok, now there is a value `V` that you want to know what it is. If you +ask, the CPU won't tell you. But before it checks that it's not supposed to tell +you `V`, it will start doing the array read. That's supposed to be OK because +it's supposed to rollback in the end. -Step 3: aha, let's do more speculative execution: based on V, load different parts of memory into cache (act that out). Then use the observation in step 1. +Step 3: aha, let's do more speculative execution: based on `V`, load different +parts of memory into cache (act that out). Then use the observation in step 1. -1. if (untrusted_offset < arr1->length) { // supposed to fail, but predicted true -2. value = arr1->data[untrusted_offset]; // not supposed to run, but because of speculation, does actually load value (you're not supposed to see it) +```C +if (untrusted_offset < arr1->length) { // supposed to fail, but predicted true + value = arr1->data[untrusted_offset]; // not supposed to run, but because of speculation, does actually load value (you're not supposed to see it) -OK, so now we "have" value, which we're not really supposed to have access to. Let's use another array to decipher what it contains. + // OK, so now we "have" value, which we're not really supposed to have access to. + // Let's use another array to decipher what it contains. -3. index2 = ((value&1)*0x100)+0x200 // can also use other bits besides &1 -4. if (index2 < arr2->length) { // again this is supposed to be false, and yet... -5. value2 = arr2->data[index2]; // trigger load of arr2->data[0x200] if bit false, 0x300 if true -6. } + index2 = ((value&1)*0x100)+0x200 // can also use other bits besides &1 + if (index2 < arr2->length) { // again this is supposed to be false, and yet... + value2 = arr2->data[index2]; // trigger load of arr2->data[0x200] if bit false, 0x300 if true + } +} -Then measure how long to load from arr2->data at index 0x200 and 0x300. +// Then measure how long to load from arr2->data at index 0x200 and 0x300. +``` ### Hyperthreading attack In hyperthreading, two threads are sharing the same execution core. That means they have hardware in common. Because of this, a thread can figure out what the other thread is doing by timing how long it takes to complete instructions. - -# After-action report, plam, 23Jan23 - -Had the students try likely/unlikely example. - -Did not do a cache attack, will do that on Friday. \ No newline at end of file diff --git a/lectures/flipped/L08.md b/lectures/flipped/L08.md index 892363a..f553718 100644 --- a/lectures/flipped/L08.md +++ b/lectures/flipped/L08.md @@ -1,4 +1,11 @@ -# L08: Cache Coherency +# Lecture 8 — Cache Coherency + +## Roadmap + +We will talk about cache coherency from the point of view of a user (not +implementer) and walk through some examples. + +## Mini-lecture Cache Coherency means @@ -45,12 +52,12 @@ machine. |Invalid | PrRd | BusRd | Valid | Therefore, for the above example, CPU1 will snoop and mark data as invalid in -step 3. In steps 4 and 5, CPU1 and CPU2 both read x from main memory. +step 3. In steps 4 and 5, CPU1 and CPU2 both read `x` from main memory. ### Write-Back Protocols This is used to merge multiple writes into a single flush. At minimum, we need -support in hardware for a "dirty" bit, which indicates the our data has been +support in hardware for a "dirty" bit, which indicates that our data has been changed but not yet been written to memory. #### MSI @@ -59,7 +66,8 @@ changed but not yet been written to memory. ##### Activity -Walk through the MSI protocol using the same (`x = 7`) example above +Walk through the MSI protocol using the same (`x = 7`) example above. (See the +MSI example in the lecture note) #### MESI @@ -94,4 +102,4 @@ Benchmark #1: ./without_false_sharing Worked through the write-through and MSI protocols. Probably worth doing in person. -Ran the false sharing example. Could have had better explanation. \ No newline at end of file +Ran the false sharing example. Could have had better explanation. diff --git a/lectures/flipped/L09.md b/lectures/flipped/L09.md index 4c2494e..19f4308 100644 --- a/lectures/flipped/L09.md +++ b/lectures/flipped/L09.md @@ -1,8 +1,13 @@ -# L09: Concurrency and Parallelism +# Lecture 9 — Concurrency and Parallelism + +## Roadmap + +We will talk about some theoratical stuffs around concurrency and parallelism. ## Amdahl's Law -Exercise: speed up reading the poem Engineers' Corner by Wendy Cope (1986) by parallelizing. But the first verse is always sequential. How does it take? +Exercise: speed up reading the poem Engineers' Corner by Wendy Cope (1986) by +parallelizing. But the first verse is always sequential. How does it take? ``` Engineers' Corner @@ -37,21 +42,28 @@ That's why this country's going down the drain. -- Wendy Cope ``` +HC: I think I didn't do this last time, I'll try it. So I might just let 1 +student read, then 2 students, 4 students, etc. However, all of them need to go +through the first verse. Then I can measure the total times. + ## Gustafson's Law ... you can read out longer and longer poems in the same amount of time. # Live coding: thread pools -Rust explorer (https://www.rustexplorer.com/) works for `live-coding/L09/threadpool/src/main.rs`. +Rust explorer (https://www.rustexplorer.com/) works for +`live-coding/L09/threadpool/src/main.rs`. # Threads vs processes -You could probably rust explorer these, but `rustc -O` works. I'd do that. `create-threads` and `create-processes` subdirectories under `live-coding/L09`. +You could probably rust explorer these, but `rustc -O` works. I'd do that. +`create-threads` and `create-processes` subdirectories under `live-coding/L09`. # Parallelization design patterns -Groups of students can pick a pattern and think of a way to act it out. Invite students to present what they come up with. Give chocolate to students. +Groups of students can pick a pattern and think of a way to act it out. Invite +students to present what they come up with. Give chocolate to students. # After-action report, plam, 27Jan23 @@ -59,5 +71,5 @@ Yes, I did the poem thing, I think it works. # After-action report, plam, 30Jan23 -Did the live coding, threads vs processes, and parallelization design patterns. It's not as awkward as I feared. - +Did the live coding, threads vs processes, and parallelization design patterns. +It's not as awkward as I feared.