Skip to content

Latest commit

 

History

History
193 lines (153 loc) · 11.2 KB

Day9.md

File metadata and controls

193 lines (153 loc) · 11.2 KB

Back to TOC
Prev: Day8$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~$Next: Day 10


Day 9: Complete Pipelined RISC-V CPU Microarchitecture

Our RISC-V core from the previous day is still incomplete w.r.t the instructions implemented, and additionally we need to do pipelining and handling of the pipeline hazards.
We need to do the following to complete the CPU Design:

  1. Pipeline the CPU, taking care of the data dependency & control flow hazards
  2. Complete the implementation of the remaining ALU instructions
  3. Implement DMEM & Load, Store instructions
  4. Implement the Unconditional Jump (JAL, JALR) instructions
Pipelining the RISC-V CPU Core D9_RISCV_Pipelined

9.1 Pipelining the CPU: Using 3-Cycle $valid signal

First, we will implement with a simplified 3-stage pipeline with using a 3-Cycle valid signal, the various stages being:

  • PC
  • Instruction Fetch + Decode
  • RF Read, ALU
  • RF Write, Branch Instrn. logic

This implementation would have an IPC of only ~1/3 as the valid signal is active once every 3 cycles (HLLHLL...) indicating only one valid instruction in the pipe at any point. We do this step to partition the core (or logic) into the respective pipeline stages first without having to worry about handling the pipeline hazards.

Waterfall Logic Diagram with 3-Cycle Valid
D9_Pipelining_with_3Cycle_Valid
TL-V Logic Implementation Diagram
D9_3Cycle_Valid_Diagram
Makerchip-generated Block Diagram for 3-Cycle Valid design
D9_3Cycle_Valid_Makerchip

9.2 Pipelining the CPU: Solving the data & control hazards

9.2.1 Data Hazard: Read-After-Write (RAW) in the Refister File

  • There is a 2-cycle delay (by design) between RF Read and Write operations.
  • Hence we have a Read-After-Write (RAW) data hazard if the current instruction in the pipe is trying to read from the Register File (RF) when the previous instruction had written to the same RF index.
  • To solve this, we need to add a Register File Bypass Mux at the input of the ALU and select the previous ALU output if the previous instruction was writing to the RF index accessed in the current instruction.
Register File Bypass Waterfall Logic Diagram
D9_RF_Bypass_Logic_Diagram
Register File Bypass TL-V Implementation
D9_RF_Bypass_TLV_Diagram

9.2.2 Control Hazard: Branch Instructions

  • We have control flow hazards when a branch is taken.
  • The PC logic is updated to handle the case when a branch is taken or not.
Branch Instruction Control Hazard
D9_Branch_Hazard

9.3 Complete the ALU

The Instruction Decoder is updated to decode all the instructions and the complete ALU is implemented. Note: All load instructions are treated as the same as the LW instruction.

9.4 DMEM & Load, Store Instructions

9.4.1 DMEM

  • The DMEM is a single-port R/W memory with 16 entries, 32-bit wide.
  • The DMEM is placed in the 4th pipeline stage.
    DMEM
    D9_DMEM

9.4.2 LOAD (LW, LH, LB, LHU, LBU) Instructions

  • LOAD rd, imm(rs1)
  • Loads the data from the DMEM address given by (rs1 + imm) to destination register provided by rd. i.e., rd <= DMEM(rs1 + imm)

9.4.3 STORE (SW, SH, LB) Instructions

  • STORE rs2, imm(rs1)
  • Stores the data from rs2 to the DMEM address given by (rs1 + imm). i.e., rd <= DMEM(rs1 + imm)

The $dmem_addr[3:0] is generated by the ALU by treating the load and store instructions to be equivalent to the ADDI instruction.

i.e., The ALU performs the following:
LOAD/ STORE : ($is_load || $is_s_instr) ? ($src1_value + $imm) 
ADDI        :                 $is_addi  ? ($src1_value + $imm) :

Since the DMEM is 32-bit wide and not byte or half-addressable:
$dmem_addr[3:0] = $result[5:2];

Muxes need to be placed at the inputs of RF write index ($rf_wr_index) and RF write data ($rf_wr_data) ports to select the appropriate values depending on the validity of the load instruction.

DMEM Load/ Store
D9_LoadStore_TLV_Logic_Diagram2

Additionally, the Program Counter logic has to be updated for load redirects.

9.5 Unconditional Jump (JAL, JALR) Instructions

  • JAL : Jump to (PC + IMM), equivalent to an unconditional branch w.r.t the calculation of the target address.
  • JALR: Jump to (SRC1 + IMM)

The logic to calculate the branch target for JALR needs to be implemented.
The Program Counter logic also needs to be modified to handle the jumps.

9.6 Complete Pipelined RISC-V CPU Core Implementation in Makerchip

Click on the image below to open up the interactive svg file:
D9_Complete_Pipelined_RISCV_Core


9.7 Bug found with the LW instruction and RF Read Bypass

Original Code: riscv_pipelined_with_LW_Bug.tlv

In the functional simulation of the RTL code in MakerChip IDE of the RISC-V CPU core that we have designed following the steps in the lecture videos and slides, I noticed two issues:

9.7.1 Issue #1

During the execution of the LW instruction, the DMEM address gets written to destination register in the first cycle.

(NOTE: This is a benign issue and not a concern)

  • Since LW is an I-type (Immediate-type instruction), the $rd (Destination Register) is valid during this phase and thus $rf_wr_en (Register File Write Enable).

    // Immediate
    $is_i_instr = ($instr[6:2] == 5'b00000) ||
                  ($instr[6:2] == 5'b00001) ||
                  ($instr[6:2] == 5'b00100) ||
                  ($instr[6:2] == 5'b00110) ||
                  ($instr[6:2] == 5'b11001);
    ...
    $is_load  = ($opcode == 7'b0000011);
    ...
    $rd_valid = $is_r_instr | $is_i_instr | $is_u_instr | $is_j_instr;
    ...
    $rf_wr_en =  ($rd_valid && $valid && $rd != 5'b0) || >>2$valid_load;
    
  • If we take the following example: m4_asm(LW, r15, r0, 00100).
    This instruction is supposed to do just: r15 [31:0] <= DMEM [(r0 + 00100)] [31:0]

    Due to our design, the DMEM address is generated by the ALU as: DMEM_addr = (rs1 + imm). Hence the ALU output (or the DMEM address) gets written to the destination register first and then two cycles later the actual data from the DMEM address gets written to the destination register.

  • In our implementation, since it takes two cycles for valid data to be fetched from the DMEM and to be written to the destination register, we are squashing the 2 instructions already in the pipe in the "shadow" of the Load instruction.
    Hence writing this intermediate value to the destination register is not a concern a.
    Nevertheless, to avoid this unnecessary RF write for a cleaner implmentation, we can deassert $rf_wr_en for these two cycles for a valid load instruction.

    $rf_wr_en = (!$valid_load && !>>1$valid_load) && ($rd_valid && ($rd != 5'b0) && $valid) || >>2$valid_load;
    

9.7.2 Issue #2

The instruction immediately following the LW instruction gets the wrong $src1_value and $src2_value

(NOTE: This is an actual BUG and breaks functionality)

  • This bug was found while checking if the above issue was causing any RAW hazards if the instruction immediately following the LW instruction accesses the destination register of the LW instruction.

  • This happens because of an incorrect RF Read Bypass in the original implementation:

    $src1_value[31:0] = (>>1$rf_wr_index == $rf_rd_index1) && >>1$rf_wr_en
                        ? >>1$result : $rf_rd_data1 ;
    
    $src2_value[31:0] = (>>1$rf_wr_index == $rf_rd_index2) && >>1$rf_wr_en
                        ? >>1$result : $rf_rd_data2 ;
    
    • In this original code, the instruction immediately in the shadow of the LW instruction gets the wrong values for $src1_value, $src2_value which are the inputs to the ALU.

    • This is because, we not accounting for the fact that the data to be written to the RF could come from either the ALU ($result) or from the DMEM ($ld_data).
      $rf_wr_data[31:0] = >>2$valid_load ? >>2$ld_data : $result;

      But we are only considering the ALU output for RF Read during a RAW Hazard.

RF Read Bypass Bug
D9_Bug_Slide_49_RF_ReadBypass

  • FIX 1: During the initial debugs, I came up with the following solution to the bug based on the simulation waveforms and the VIZ_JS debug prints.

    // Handling Read-After-Write Hazard
    $src1_value[31:0] = >>3$valid_load && (>>3$rf_wr_index == $rf_rd_index1) ? >>3$ld_data :
                        (>>1$rf_wr_index == $rf_rd_index1) && >>1$rf_wr_en ? >>1$result   :
                        $rf_rd_data1;
    
    $src2_value[31:0] = >>3$valid_load && (>>3$rf_wr_index == $rf_rd_index2) ? >>3$ld_data :
                        (>>1$rf_wr_index == $rf_rd_index2) && >>1$rf_wr_en ? >>1$result   :
                        $rf_rd_data2;
    
  • FIX 2: Talking to Steve H. actually got me a better understanding of the issue, and he suggested the following code change:

    // Handling Read-After-Write Hazard
    $src1_value[31:0] = (>>1$rf_wr_index == $rf_rd_index1) && >>1$rf_wr_en
                        ? >>1$rf_wr_data : $rf_rd_data1;
    
    $src2_value[31:0] = (>>1$rf_wr_index == $rf_rd_index2) && >>1$rf_wr_en
                        ? >>1$rf_wr_data : $rf_rd_data2;
    

Prev: Day8$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~$Next: Day 10