CG3207: RISC-V CPU on FPGA

CG3207: RISC-V CPU on FPGA

Aug 2025Nov 2025

Role: Digital Design Engineer (Pair Project)

Designed and implemented a 32-bit RISC-V processor in Verilog as part of the CG3207 Computer Architecture course at NUS. The processor was deployed on a Nexys 4 FPGA and follows a standard 5-stage pipeline architecture: Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory (MEM), and Writeback (WB).

Pipeline Architecture

The processor uses a fully pipelined datapath with registers between all stages. A hardware hazard unit was implemented to handle both forwarding and stalling. Forwarding resolves data hazards whenever possible, while load-use hazards and multi-cycle operations trigger pipeline stalls, allowing programs to run correctly without manual NOP insertion.

  • Standard 5-stage pipeline: IF, ID, EX, MEM, WB
  • Pipeline registers between all stages
  • Hardware hazard detection and control
  • Forwarding paths for ALU and memory results
  • Automatic pipeline stalling for unresolved hazards
  • Program execution without manual NOP insertion

Branch Prediction

Implemented dynamic branch prediction using a Branch History Table (BHT) with 12-bit indexing (4096 entries) and a Branch Target Buffer (BTB). Each BHT entry uses a 2-bit saturating counter to predict branch direction. Since only 12 bits of the PC are used for indexing, multiple instructions can map to the same entry. Predicted targets are supplied during the fetch stage, and mispredictions trigger pipeline flush and recovery in the execute stage.

  • 2-bit saturating counter branch prediction
  • BHT indexed by lower 12 bits of PC (4096 entries)
  • Branch Target Buffer for predicted targets
  • Prediction performed in fetch stage
  • Misprediction detection in execute stage
  • Automatic pipeline flush on misprediction

Multi-Cycle Arithmetic Unit

Hardware multiplication is implemented using a dedicated multi-cycle execution unit with Booth's algorithm for signed multiplication. The processor stalls automatically while multi-cycle operations are in progress. Division instructions were not implemented.

  • Dedicated multi-cycle execution unit
  • Booth's algorithm for signed multiplication
  • Supports signed and unsigned multiplication
  • Processor stall control using Busy signal
  • Division instructions not implemented

Processor Components

  • Register file with 32 general-purpose registers
  • ALU supporting arithmetic and logical operations
  • Control unit and instruction decoder
  • Hazard detection and forwarding unit
  • Branch prediction unit
  • Pipeline registers between all stages

Instruction Support

Supports the RV32I base instruction set along with multiplication instructions. Assembly programs were written to verify the correctness of the pipeline, hazard handling, and branch prediction mechanisms.

Final Version