CG3207: RISC-V CPU on FPGA

CG3207: RISC-V CPU on FPGA

Aug 2025Nov 2025

Role: Digital Design Engineer (Pair Project)

Designed and implemented a 32-bit RV32I processor in Verilog and deployed it on a Nexys 4 FPGA. The design follows a classical five-stage pipeline but was extended with dynamic branch prediction, hardware hazard resolution, and a multi-cycle execution unit. The goal was to achieve correct program execution without compiler-inserted NOPs while maintaining high instruction throughput.

Pipeline Architecture

The datapath is fully pipelined with registers between IF, ID, EX, MEM, and WB stages. A dedicated hazard unit monitors source and destination registers across pipeline stages and dynamically selects forwarding paths from the EX/MEM and MEM/WB latches. Load-use hazards and multi-cycle operations assert a stall signal that freezes the upstream pipeline while allowing downstream stages to drain, preserving correctness without corrupting state.

Dynamic Branch Prediction

Control hazards were mitigated using a Branch History Table indexed by the lower bits of the program counter and a Branch Target Buffer supplying predicted next addresses during the fetch stage. Each entry uses a two-bit saturating counter, allowing the predictor to learn branch behaviour over time. Mispredictions are detected in the execute stage, triggering a pipeline flush and redirecting the PC to the correct target.

Using only the lower PC bits for indexing introduces aliasing, but this trade-off reduced memory usage while maintaining acceptable prediction accuracy for benchmark programs.

Multi-Cycle Execution Unit

Multiplication is implemented using a dedicated multi-cycle Booth unit operating alongside the main ALU. When a multiply instruction enters the execute stage, the pipeline asserts a busy signal that stalls instruction issue until the result is ready. This allows complex arithmetic without lengthening the critical path of the single-cycle ALU.

Verification Strategy

Assembly test programs were written to stress data hazards, control hazards, and back-to-back dependent operations. Waveform inspection was used to confirm correct forwarding paths, stall timing, and pipeline flush behaviour. The processor successfully executed programs without manual scheduling, demonstrating correct hazard resolution.

Final Implementation