Designed and implemented a 32-bit RV32I processor in Verilog and deployed it on a Nexys 4 FPGA. The design follows a classical five-stage pipeline but was extended with dynamic branch prediction, hardware hazard resolution, and a multi-cycle execution unit. The goal was to achieve correct program execution without compiler-inserted NOPs while maintaining high instruction throughput.
The datapath is fully pipelined with registers between IF, ID, EX, MEM, and WB stages. A dedicated hazard unit monitors source and destination registers across pipeline stages and dynamically selects forwarding paths from the EX/MEM and MEM/WB latches. Load-use hazards and multi-cycle operations assert a stall signal that freezes the upstream pipeline while allowing downstream stages to drain, preserving correctness without corrupting state.
Control hazards were mitigated using a Branch History Table indexed by the lower bits of the program counter and a Branch Target Buffer supplying predicted next addresses during the fetch stage. Each entry uses a two-bit saturating counter, allowing the predictor to learn branch behaviour over time. Mispredictions are detected in the execute stage, triggering a pipeline flush and redirecting the PC to the correct target.
Using only the lower PC bits for indexing introduces aliasing, but this trade-off reduced memory usage while maintaining acceptable prediction accuracy for benchmark programs.
Multiplication is implemented using a dedicated multi-cycle Booth unit operating alongside the main ALU. When a multiply instruction enters the execute stage, the pipeline asserts a busy signal that stalls instruction issue until the result is ready. This allows complex arithmetic without lengthening the critical path of the single-cycle ALU.
Assembly test programs were written to stress data hazards, control hazards, and back-to-back dependent operations. Waveform inspection was used to confirm correct forwarding paths, stall timing, and pipeline flush behaviour. The processor successfully executed programs without manual scheduling, demonstrating correct hazard resolution.