đ Pipeline Stages
Understand how pipelining improves CPU throughput by overlapping instruction execution across five classic stages: Fetch, Decode, Execute, Memory, and Write-back.
What is Pipelining?
Pipeline Processing
Interactive Pipeline Visualizer
Live Demonstration
Cycle 1 â Pipeline State
| Instruction | IF | ID | EX | MEM | WB |
|---|---|---|---|---|---|
| I1 ADD R1, R2, R3 | I1 | â | â | â | â |
| I2 SUB R4, R1, R5 | â | â | â | â | â |
| I3 LW R6, 0(R7) | â | â | â | â | â |
| I4 AND R8, R9, R10 | â | â | â | â | â |
| I5 OR R11, R12, R13 | â | â | â | â | â |
5-Stage Pipeline Breakdown
IF
Fetch instruction from memory using PC
ID
Decode instruction, read registers
EX
Execute operation in ALU
MEM
Access data memory (load/store)
WB
Write result back to register
Pipeline Registers
Speedup Formula & Analysis
For a k-stage pipeline executing n instructions:
Speedup approaches k (number of stages) as n â â
For n = 5 instructions and k = 5 stages:
Total Cycles
9
Actual Speedup
2.78x
Ideal (k stages)
5.00x
Amdahl's Law & Pipelining
| n (instructions) | Ideal Speedup | Actual Speedup | Efficiency |
|---|---|---|---|
| 1 | 1.0x | 1.00x | 100% |
| 5 | 5.0x | 4.44x | 89% |
| 10 | 10.0x | 7.14x | 71% |
| 50 | 50.0x | 27.78x | 56% |
| 100 | 100.0x | 48.08x | 48% |
| 1000 | 1000.0x | 498.00x | 50% |
Code Example: Pipeline Simulation
python
# 5-stage RISC-V pipeline simulation
stages = ["IF", "ID", "EX", "MEM", "WB"]
instructions = ["ADD R1,R2,R3", "SUB R4,R1,R5", "LW R6,0(R7)", "AND R8,R9,R10"]
for cycle in range(1, len(instructions) + 5):
print(f"Cycle {cycle}: ", end="")
for i, inst in enumerate(instructions):
stage_idx = cycle - i - 1
if 0 <= stage_idx < 5:
print(f"[{stages[stage_idx]}] {inst} ", end="")
print()
# Speedup calculation
n = len(instructions)
k = len(stages)
total_cycles = n + k - 1
speedup = (n * k) / total_cycles
print(f"\nSpeedup: {speedup:.2f}x (ideal: {k}x)")Pipeline Hazards Overview
Types of Hazards
Interview Questions
What is pipelining and how does it improve performance?
Pipelining overlaps execution of multiple instructions by dividing the datapath into stages. Each stage works on a different instruction simultaneously. It improves throughput (instructions per cycle) but not single-instruction latency. Speedup = (n à k) / (k + n â 1) for n instructions and k stages.
Explain the 5 stages of a classic RISC pipeline.
(1) IF: Fetch instruction from memory using PC address. (2) ID: Decode instruction and read register operands. (3) EX: Execute ALU operation or calculate address. (4) MEM: Access data memory for load/store. (5) WB: Write result back to register file.
Why can't pipelining achieve ideal speedup?
Ideal speedup (equal to number of stages) is limited by: pipeline fill/drain latency at start/end, uneven stage delays (clock limited by slowest stage), hazards requiring stalls, and dependencies between instructions. As instruction count â â, speedup approaches k.
What are pipeline registers and why are they needed?
Pipeline registers (IF/ID, ID/EX, EX/MEM, MEM/WB) sit between stages to hold intermediate data. Each cycle, the result of one stage is latched into the next pipeline register. This allows all stages to operate in parallel on different instructions, since each stage reads from its input register and writes to its output register.