Skip to content

Commit

Permalink
Prepare L07 & L08 and part of L09 flipped notes
Browse files Browse the repository at this point in the history
  • Loading branch information
h365chen committed Jan 15, 2024
1 parent 8490dcd commit c0e2f04
Show file tree
Hide file tree
Showing 3 changed files with 76 additions and 36 deletions.
68 changes: 44 additions & 24 deletions lectures/flipped/L07.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
# Lecture 07: CPU Hardware, Branch Prediction

## CPU info
## Roadmap

```sh
cat /proc/cpuinfo
```
We will talk about branch prediction and try some experiments.

We will also talk about some of the interesting attacks using it.

## Mini-lecture

Mainly the "How does branch prediction work" section in the lecture note

## Likely and Unlikely

Expand Down Expand Up @@ -54,7 +58,7 @@ hyperfine --warmup 3 'cargo run --release'

### Cache attack

Meltdown
#### Meltdown

```C
// toy example
Expand All @@ -66,40 +70,56 @@ access(probe_array[data * 4096]);
// the time to access other probe_array[j], then data = i/4096;
```
#### Spectre
```C
// x < array1_size is mostly true to train the branch predictor
// then attacker will set x to an out of bound value
// thus, array1[x] can be then infered
if (x < array1_size) {
y = array2[array1[x] * 4096];
}
```

### Cache attack, plam version

Spectre

Idea: use cache timing information to figure out what a given bit (supposedly inaccessible) is.
Idea: use cache timing information to figure out what a given bit (supposedly
inaccessible) is.

Let's build up the attack.
Let's build up the attack.

Step 1: observe that loading from cache is much faster than loading from memory. Act that out by asking for a value; the value either comes from on top of the desk (cache) or inside the cabinet (memory).
Step 1: observe that loading from cache is much faster than loading from memory.
Act that out by asking for a value; the value either comes from on top of the
desk (cache) or inside the cabinet (memory).

Step 2: ok, now there is a value V that you want to know what it is. If you ask, the CPU won't tell you. But before it checks that it's not supposed to tell you V, it will start doing the array read. That's supposed to be OK because it's supposed to rollback before it tells you.
Step 2: ok, now there is a value `V` that you want to know what it is. If you
ask, the CPU won't tell you. But before it checks that it's not supposed to tell
you `V`, it will start doing the array read. That's supposed to be OK because
it's supposed to rollback in the end.

Step 3: aha, let's do more speculative execution: based on V, load different parts of memory into cache (act that out). Then use the observation in step 1.
Step 3: aha, let's do more speculative execution: based on `V`, load different
parts of memory into cache (act that out). Then use the observation in step 1.

1. if (untrusted_offset < arr1->length) { // supposed to fail, but predicted true
2. value = arr1->data[untrusted_offset]; // not supposed to run, but because of speculation, does actually load value (you're not supposed to see it)
```C
if (untrusted_offset < arr1->length) { // supposed to fail, but predicted true
value = arr1->data[untrusted_offset]; // not supposed to run, but because of speculation, does actually load value (you're not supposed to see it)

OK, so now we "have" value, which we're not really supposed to have access to. Let's use another array to decipher what it contains.
// OK, so now we "have" value, which we're not really supposed to have access to.
// Let's use another array to decipher what it contains.

3. index2 = ((value&1)*0x100)+0x200 // can also use other bits besides &1
4. if (index2 < arr2->length) { // again this is supposed to be false, and yet...
5. value2 = arr2->data[index2]; // trigger load of arr2->data[0x200] if bit false, 0x300 if true
6. }
index2 = ((value&1)*0x100)+0x200 // can also use other bits besides &1
if (index2 < arr2->length) { // again this is supposed to be false, and yet...
value2 = arr2->data[index2]; // trigger load of arr2->data[0x200] if bit false, 0x300 if true
}
}

Then measure how long to load from arr2->data at index 0x200 and 0x300.
// Then measure how long to load from arr2->data at index 0x200 and 0x300.
```

### Hyperthreading attack

In hyperthreading, two threads are sharing the same execution core. That means
they have hardware in common. Because of this, a thread can figure out what the
other thread is doing by timing how long it takes to complete instructions.
# After-action report, plam, 23Jan23
Had the students try likely/unlikely example.
Did not do a cache attack, will do that on Friday.
18 changes: 13 additions & 5 deletions lectures/flipped/L08.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
# L08: Cache Coherency
# Lecture 8 — Cache Coherency

## Roadmap

We will talk about cache coherency from the point of view of a user (not
implementer) and walk through some examples.

## Mini-lecture

Cache Coherency means

Expand Down Expand Up @@ -45,12 +52,12 @@ machine.
|Invalid | PrRd | BusRd | Valid |

Therefore, for the above example, CPU1 will snoop and mark data as invalid in
step 3. In steps 4 and 5, CPU1 and CPU2 both read x from main memory.
step 3. In steps 4 and 5, CPU1 and CPU2 both read `x` from main memory.

### Write-Back Protocols

This is used to merge multiple writes into a single flush. At minimum, we need
support in hardware for a "dirty" bit, which indicates the our data has been
support in hardware for a "dirty" bit, which indicates that our data has been
changed but not yet been written to memory.

#### MSI
Expand All @@ -59,7 +66,8 @@ changed but not yet been written to memory.

##### Activity

Walk through the MSI protocol using the same (`x = 7`) example above
Walk through the MSI protocol using the same (`x = 7`) example above. (See the
MSI example in the lecture note)

#### MESI

Expand Down Expand Up @@ -94,4 +102,4 @@ Benchmark #1: ./without_false_sharing

Worked through the write-through and MSI protocols. Probably worth doing in person.

Ran the false sharing example. Could have had better explanation.
Ran the false sharing example. Could have had better explanation.
26 changes: 19 additions & 7 deletions lectures/flipped/L09.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
# L09: Concurrency and Parallelism
# Lecture 9 — Concurrency and Parallelism

## Roadmap

We will talk about some theoratical stuffs around concurrency and parallelism.

## Amdahl's Law

Exercise: speed up reading the poem Engineers' Corner by Wendy Cope (1986) by parallelizing. But the first verse is always sequential. How does it take?
Exercise: speed up reading the poem Engineers' Corner by Wendy Cope (1986) by
parallelizing. But the first verse is always sequential. How does it take?

```
Engineers' Corner
Expand Down Expand Up @@ -37,27 +42,34 @@ That's why this country's going down the drain.
-- Wendy Cope
```

HC: I think I didn't do this last time, I'll try it. So I might just let 1
student read, then 2 students, 4 students, etc. However, all of them need to go
through the first verse. Then I can measure the total times.

## Gustafson's Law

... you can read out longer and longer poems in the same amount of time.

# Live coding: thread pools

Rust explorer (https://www.rustexplorer.com/) works for `live-coding/L09/threadpool/src/main.rs`.
Rust explorer (https://www.rustexplorer.com/) works for
`live-coding/L09/threadpool/src/main.rs`.

# Threads vs processes

You could probably rust explorer these, but `rustc -O` works. I'd do that. `create-threads` and `create-processes` subdirectories under `live-coding/L09`.
You could probably rust explorer these, but `rustc -O` works. I'd do that.
`create-threads` and `create-processes` subdirectories under `live-coding/L09`.

# Parallelization design patterns

Groups of students can pick a pattern and think of a way to act it out. Invite students to present what they come up with. Give chocolate to students.
Groups of students can pick a pattern and think of a way to act it out. Invite
students to present what they come up with. Give chocolate to students.

# After-action report, plam, 27Jan23

Yes, I did the poem thing, I think it works.

# After-action report, plam, 30Jan23

Did the live coding, threads vs processes, and parallelization design patterns. It's not as awkward as I feared.

Did the live coding, threads vs processes, and parallelization design patterns.
It's not as awkward as I feared.

0 comments on commit c0e2f04

Please sign in to comment.