Skip to content

Commit

Permalink
Prepare L06 flipped note
Browse files Browse the repository at this point in the history
  • Loading branch information
h365chen committed Jan 8, 2024
1 parent 95d6412 commit 9872d76
Showing 1 changed file with 29 additions and 14 deletions.
43 changes: 29 additions & 14 deletions lectures/flipped/L06.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,50 @@
# Lecture 06: Modern Processors
# Lecture 6 — Modern Processors

## Roadmap

We will talk about some techniques to speed up CPU execution

## Mini-lecture

CISC to RISC leads to impressive scaling on CPU frequency during a time, but, we
hit the wall eventually: clock speeds stop getting faster around 2005, stopping
at around 3 GHz. That's why we seek other techniques.

- Pipelining

This is straightforward I guess.

- Register renaming

```asm
MOV R2, R7 + 32
ADD R1, R2
MOV R2, R9 + 64
MOV R2, R9 + 64 ; we can rename R2 to say RY
ADD R3, R2
```

- Speculation

```asm
ld rax, rbx+16 ; assume cache miss
add rbx, 16 ; carry on anyway, ADD doesn’t need rax value from LD
; register renaming => LD (write to reg)/ADD (read from reg) don’t interfere
cmp rax, 0 ; needs rax value, queue till available
jeq null_chk ; oops! need cmp result
; speculate: assume branch not taken
st rbx-16, rcx ; speculatively store to store buf (not L1)
ld rcx, rdx ; unrelated cache miss: 2 misses now active, 1 speculative
ld rax, rax+8 ; now must wait for result of first LD
ld rax, rbx+16 ; assume cache miss
add rbx, 16 ; carry on anyway, ADD doesn’t need rax value from LD
; register renaming => LD (read rbx)/ADD (write to renamed rbx) don’t interfere
cmp rax, 0 ; needs rax value, queue till available
jeq null_chk ; oops! need cmp result
; speculate: assume branch not taken
st rbx-16, rcx ; speculatively store to store buf (not L1)
ld rcx, rdx ; unrelated cache miss: 2 misses now active, 1 speculative
ld rax, rax+8 ; now must wait for result of first LD since we need rax
; but we still almost cut the time in half
```

## Calculation

### q1

Assume we can always find the data in L3 cache, cache miss rates are 40 per 1000
for L1D and 4 per 1000 for L2, and cache miss penalty are 5 cycles for L1D and 300
cycles for L2, what is the average running time for an instruction?
for L1D (L1 data) and 4 per 1000 for L2, and cache miss penalty are 5 cycles for
L1D and 300 cycles for L2, what is the average running time for an instruction?

### q2

Expand All @@ -44,7 +57,9 @@ if you have a page fault?

Talked about frequency scaling.

Pipelining: Put 5 instructions on post-it notes. First, had a student acting out executing the stages of the instructions sequentially. Then, had 4 more students come up, and acted out pipelining the instructions. Just a bit of chaos here.
Pipelining: Put 5 instructions on post-it notes. First, had a student acting out
executing the stages of the instructions sequentially. Then, had 4 more students
come up, and acted out pipelining the instructions. Just a bit of chaos here.

Did an illustration of waiting for cache/working in the miss shadow.

Expand Down

0 comments on commit 9872d76

Please sign in to comment.