Designed and implemented a 32-bit RISC-V processor in Verilog as part of the CG3207 Computer Architecture course at NUS. The processor was deployed on a Nexys 4 FPGA and follows a standard 5-stage pipeline architecture: Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory (MEM), and Writeback (WB).
The processor uses a fully pipelined datapath with registers between all stages. A hardware hazard unit was implemented to handle both forwarding and stalling. Forwarding resolves data hazards whenever possible, while load-use hazards and multi-cycle operations trigger pipeline stalls, allowing programs to run correctly without manual NOP insertion.
Implemented dynamic branch prediction using a Branch History Table (BHT) with 12-bit indexing (4096 entries) and a Branch Target Buffer (BTB). Each BHT entry uses a 2-bit saturating counter to predict branch direction. Since only 12 bits of the PC are used for indexing, multiple instructions can map to the same entry. Predicted targets are supplied during the fetch stage, and mispredictions trigger pipeline flush and recovery in the execute stage.
Hardware multiplication is implemented using a dedicated multi-cycle execution unit with Booth's algorithm for signed multiplication. The processor stalls automatically while multi-cycle operations are in progress. Division instructions were not implemented.
Supports the RV32I base instruction set along with multiplication instructions. Assembly programs were written to verify the correctness of the pipeline, hazard handling, and branch prediction mechanisms.