ArchitectureInstruction Execution

⏱️ Instruction Execution

How the CPU fetches, decodes, and executes instructions step by step through the instruction cycle.

The Instruction Cycle

ℹ️

Fundamental CPU Operation

The instruction cycle (or fetch-execute cycle) is the basic operation of a CPU. Each instruction goes through a sequence of states, from being fetched from memory to having its results written back. Modern pipelined CPUs overlap these states across multiple instructions.

→

MEM

→

Cycle States in Detail

IF (Instruction Fetch)

PC sends address to MAR → memory read → instruction → MDR → IR. PC incremented by 4.

ID (Instruction Decode)

Control unit decodes opcode and funct fields. Register file reads rs and rt. Sign-extends immediate if needed.

EX (Execute)

ALU performs operation: arithmetic, logical, address calculation, or branch target computation. Condition flags set.

MEM (Memory Access)

For loads: MAR ← address, memory read → MDR. For stores: MAR ← address, MDR ← data, memory write. Skip for ALU ops.

WB (Write Back)

Result written to register file (rd for R-type, rt for I-type). For loads: MDR → register. PC updated (or loaded from branch target).

Interactive CPU Simulator

CPU Simulator

; Current Instruction: ADD $1, $2, $3

; Cycle: 1/5 — IF (Instruction Fetch)

▶ IF: Fetch from PC=0x1000

ID: Decode R-type, read $2=10, $3=20

EX: ALU computes 10+20=30

MEM: No memory access

WB: $1 ← 30, PC ← 0x1004

MEM

Addressing Mode Impact on Execution

Addressing Mode	Extra Cycles	Explanation
Immediate	0	Operand is in instruction register; no extra access
Register	0	Operand in register file; accessed during ID
Register Indirect	1	Need memory access to get operand after address calculation
Indirect	2	Memory access to get address, then another to get operand
Indexed	1	Base + index calculation in EX, then memory access in MEM

Multiple Bus Architecture

CPU Internal Buses for Execution

Bus A (rs)

Bus B (rt)

Bus C (rd)

ALU

Data Bus

Address Bus

Data Memory

During a single cycle, the register file reads rs onto Bus A and rt onto Bus B simultaneously. The ALU computes the result while the next instruction is being fetched over the instruction bus. This parallelism is key to single-cycle-per-instruction execution in RISC designs.

CPU Simulator (C Code)

#include <stdint.h>

typedef struct {
    uint32_t regs[32];
    uint32_t pc;
    uint32_t memory[4096];
    uint32_t ir;   // instruction register
    uint32_t mar;  // memory address register
    uint32_t mdr;  // memory data register
} CPU;

void fetch(CPU *cpu) {
    cpu->mar = cpu->pc;                    // address out
    cpu->mdr = cpu->memory[cpu->mar >> 2]; // memory read
    cpu->ir  = cpu->mdr;                   // load IR
    cpu->pc += 4;                          // increment PC
}

void execute(CPU *cpu) {
    uint8_t opcode = (cpu->ir >> 26) & 0x3F;
    uint8_t rs     = (cpu->ir >> 21) & 0x1F;
    uint8_t rt     = (cpu->ir >> 16) & 0x1F;
    uint8_t rd     = (cpu->ir >> 11) & 0x1F;
    uint8_t funct  = cpu->ir & 0x3F;
    int16_t imm    = (int16_t)(cpu->ir & 0xFFFF);
    
    if (opcode == 0) { // R-type
        switch (funct) {
            case 0x20: cpu->regs[rd] = cpu->regs[rs] + cpu->regs[rt]; break;
            case 0x22: cpu->regs[rd] = cpu->regs[rs] - cpu->regs[rt]; break;
            case 0x24: cpu->regs[rd] = cpu->regs[rs] & cpu->regs[rt]; break;
            case 0x25: cpu->regs[rd] = cpu->regs[rs] | cpu->regs[rt]; break;
        }
    } else if (opcode == 0x23) { // LW
        uint32_t addr = cpu->regs[rs] + imm;
        cpu->regs[rt] = cpu->memory[addr >> 2];
    }
}

void run(CPU *cpu) {
    while (cpu->pc < sizeof(cpu->memory)) {
        fetch(cpu);
        execute(cpu);
    }
}

Interview Questions

Explain the Fetch-Decode-Execute cycle in detail.

The instruction cycle has 5 stages: 1) IF: Instruction is fetched from memory address in PC into IR, PC increments. 2) ID: Control unit decodes the opcode, reads registers from register file. 3) EX: ALU performs the operation. 4) MEM: Data memory accessed for loads/stores. 5) WB: Results written back to register file. Each stage takes one clock cycle in a standard 5-stage pipeline.

How does the addressing mode affect the instruction execution cycle?

The addressing mode determines what happens in the EX and MEM stages. Immediate mode skips MEM (operand in instruction). Register mode skips MEM. Register indirect needs address calculation in EX + memory access in MEM. Indirect needs two memory accesses (one for address, one for operand). More complex modes add extra cycles or pipeline stages.

What happens in the CPU during each clock cycle of instruction execution?

In single-cycle implementations, one instruction completes per cycle but the cycle time is long (worst-case path). In multi-cycle, each state (T1-T6) takes one cycle: T1: MAR←PC, memory read. T2: MDR→IR. T3: decode, increment PC. T4: ALU execute. T5: memory access if needed. T6: write back. Multi-cycle allows different instructions to take different numbers of cycles.

How do modern CPUs execute instructions out of order?

Out-of-order execution decodes instructions into μops, places them in a reorder buffer (ROB), and dispatches them to functional units when operands are ready (register renaming avoids false dependencies). Results are written to the ROB and committed in program order to maintain precise exceptions. This extracts ILP beyond what in-order execution can achieve.