ArchitecturePipelining

📊 Pipeline Stages

Understand how pipelining improves CPU throughput by overlapping instruction execution across five classic stages: Fetch, Decode, Execute, Memory, and Write-back.

What is Pipelining?

ℹ️

Pipeline Processing

Pipelining is a technique where multiple instructions are overlapped in execution. While one instruction is being decoded, the next is being fetched. This doesn't reduce latency of a single instruction but increases throughput — more instructions completed per unit time.

Fetch

→

Decode

→

Execute

→

Memory

→

Write

Interactive Pipeline Visualizer

💡

Live Demonstration

Click Next Cycle to watch instructions flow through the 5-stage pipeline. Each colored block represents an instruction in a pipeline stage. See how the pipeline fills and drains over cycles.

Cycle #1

Cycle 1 — Pipeline State

Instruction	IF	ID	EX	MEM	WB
I1 ADD R1, R2, R3	I1	—	—	—	—
I2 SUB R4, R1, R5	—	—	—	—	—
I3 LW R6, 0(R7)	—	—	—	—	—
I4 AND R8, R9, R10	—	—	—	—	—
I5 OR R11, R12, R13	—	—	—	—	—

Fetch instruction from memory using PC

ADD R1, R2, R3

Decode instruction, read registers

Execute operation in ALU

MEM

Access data memory (load/store)

Write result back to register

5-Stage Pipeline Breakdown

📡

IF

Fetch instruction from memory using PC

🧩

ID

Decode instruction, read registers

⚙️

EX

Execute operation in ALU

💾

MEM

Access data memory (load/store)

📝

WB

Write result back to register

ℹ️

Pipeline Registers

Between each stage are pipeline registers (IF/ID, ID/EX, EX/MEM, MEM/WB) that hold intermediate results. These registers isolate stages so each can work independently on different instructions in the same cycle.

Speedup Formula & Analysis

For a k-stage pipeline executing n instructions:

Speedup = (n × k) / (k + n − 1)

Speedup approaches k (number of stages) as n → ∞

For n = 5 instructions and k = 5 stages:

Total Cycles

Actual Speedup

2.78x

Ideal (k stages)

5.00x

⚠️

Amdahl's Law & Pipelining

Speedup is limited by the slowest pipeline stage. If EX takes 2ns and all others take 1ns, the pipeline clock is limited to 2ns. Balanced stages maximize throughput.

n (instructions)	Ideal Speedup	Actual Speedup	Efficiency
1	1.0x	1.00x	100%
5	5.0x	4.44x	89%
10	10.0x	7.14x	71%
50	50.0x	27.78x	56%
100	100.0x	48.08x	48%
1000	1000.0x	498.00x	50%

Code Example: Pipeline Simulation

python

# 5-stage RISC-V pipeline simulation
stages = ["IF", "ID", "EX", "MEM", "WB"]
instructions = ["ADD R1,R2,R3", "SUB R4,R1,R5", "LW R6,0(R7)", "AND R8,R9,R10"]

for cycle in range(1, len(instructions) + 5):
    print(f"Cycle {cycle}: ", end="")
    for i, inst in enumerate(instructions):
        stage_idx = cycle - i - 1
        if 0 <= stage_idx < 5:
            print(f"[{stages[stage_idx]}] {inst}  ", end="")
    print()

# Speedup calculation
n = len(instructions)
k = len(stages)
total_cycles = n + k - 1
speedup = (n * k) / total_cycles
print(f"\nSpeedup: {speedup:.2f}x (ideal: {k}x)")

Pipeline Hazards Overview

⚠️

Types of Hazards

Pipelining introduces hazards that can stall the pipeline: Structural hazards (resource conflicts), Data hazards (instruction dependencies), and Control hazards (branches). These are covered in detail in the Data Hazards section.

Interview Questions

What is pipelining and how does it improve performance?

Pipelining overlaps execution of multiple instructions by dividing the datapath into stages. Each stage works on a different instruction simultaneously. It improves throughput (instructions per cycle) but not single-instruction latency. Speedup = (n × k) / (k + n − 1) for n instructions and k stages.

Explain the 5 stages of a classic RISC pipeline.

(1) IF: Fetch instruction from memory using PC address. (2) ID: Decode instruction and read register operands. (3) EX: Execute ALU operation or calculate address. (4) MEM: Access data memory for load/store. (5) WB: Write result back to register file.

Why can't pipelining achieve ideal speedup?

Ideal speedup (equal to number of stages) is limited by: pipeline fill/drain latency at start/end, uneven stage delays (clock limited by slowest stage), hazards requiring stalls, and dependencies between instructions. As instruction count → ∞, speedup approaches k.

What are pipeline registers and why are they needed?

Pipeline registers (IF/ID, ID/EX, EX/MEM, MEM/WB) sit between stages to hold intermediate data. Each cycle, the result of one stage is latched into the next pipeline register. This allows all stages to operate in parallel on different instructions, since each stage reads from its input register and writes to its output register.