⏱️ Instruction Execution
How the CPU fetches, decodes, and executes instructions step by step through the instruction cycle.
The Instruction Cycle
Fundamental CPU Operation
Cycle States in Detail
PC sends address to MAR → memory read → instruction → MDR → IR. PC incremented by 4.
Control unit decodes opcode and funct fields. Register file reads rs and rt. Sign-extends immediate if needed.
ALU performs operation: arithmetic, logical, address calculation, or branch target computation. Condition flags set.
For loads: MAR ← address, memory read → MDR. For stores: MAR ← address, MDR ← data, memory write. Skip for ALU ops.
Result written to register file (rd for R-type, rt for I-type). For loads: MDR → register. PC updated (or loaded from branch target).
Interactive CPU Simulator
Addressing Mode Impact on Execution
| Addressing Mode | Extra Cycles | Explanation |
|---|---|---|
| Immediate | 0 | Operand is in instruction register; no extra access |
| Register | 0 | Operand in register file; accessed during ID |
| Register Indirect | 1 | Need memory access to get operand after address calculation |
| Indirect | 2 | Memory access to get address, then another to get operand |
| Indexed | 1 | Base + index calculation in EX, then memory access in MEM |
Multiple Bus Architecture
CPU Internal Buses for Execution
During a single cycle, the register file reads rs onto Bus A and rt onto Bus B simultaneously. The ALU computes the result while the next instruction is being fetched over the instruction bus. This parallelism is key to single-cycle-per-instruction execution in RISC designs.
CPU Simulator (C Code)
c
#include <stdint.h>
typedef struct {
uint32_t regs[32];
uint32_t pc;
uint32_t memory[4096];
uint32_t ir; // instruction register
uint32_t mar; // memory address register
uint32_t mdr; // memory data register
} CPU;
void fetch(CPU *cpu) {
cpu->mar = cpu->pc; // address out
cpu->mdr = cpu->memory[cpu->mar >> 2]; // memory read
cpu->ir = cpu->mdr; // load IR
cpu->pc += 4; // increment PC
}
void execute(CPU *cpu) {
uint8_t opcode = (cpu->ir >> 26) & 0x3F;
uint8_t rs = (cpu->ir >> 21) & 0x1F;
uint8_t rt = (cpu->ir >> 16) & 0x1F;
uint8_t rd = (cpu->ir >> 11) & 0x1F;
uint8_t funct = cpu->ir & 0x3F;
int16_t imm = (int16_t)(cpu->ir & 0xFFFF);
if (opcode == 0) { // R-type
switch (funct) {
case 0x20: cpu->regs[rd] = cpu->regs[rs] + cpu->regs[rt]; break;
case 0x22: cpu->regs[rd] = cpu->regs[rs] - cpu->regs[rt]; break;
case 0x24: cpu->regs[rd] = cpu->regs[rs] & cpu->regs[rt]; break;
case 0x25: cpu->regs[rd] = cpu->regs[rs] | cpu->regs[rt]; break;
}
} else if (opcode == 0x23) { // LW
uint32_t addr = cpu->regs[rs] + imm;
cpu->regs[rt] = cpu->memory[addr >> 2];
}
}
void run(CPU *cpu) {
while (cpu->pc < sizeof(cpu->memory)) {
fetch(cpu);
execute(cpu);
}
}Interview Questions
Explain the Fetch-Decode-Execute cycle in detail.
The instruction cycle has 5 stages: 1) IF: Instruction is fetched from memory address in PC into IR, PC increments. 2) ID: Control unit decodes the opcode, reads registers from register file. 3) EX: ALU performs the operation. 4) MEM: Data memory accessed for loads/stores. 5) WB: Results written back to register file. Each stage takes one clock cycle in a standard 5-stage pipeline.
How does the addressing mode affect the instruction execution cycle?
The addressing mode determines what happens in the EX and MEM stages. Immediate mode skips MEM (operand in instruction). Register mode skips MEM. Register indirect needs address calculation in EX + memory access in MEM. Indirect needs two memory accesses (one for address, one for operand). More complex modes add extra cycles or pipeline stages.
What happens in the CPU during each clock cycle of instruction execution?
In single-cycle implementations, one instruction completes per cycle but the cycle time is long (worst-case path). In multi-cycle, each state (T1-T6) takes one cycle: T1: MAR←PC, memory read. T2: MDR→IR. T3: decode, increment PC. T4: ALU execute. T5: memory access if needed. T6: write back. Multi-cycle allows different instructions to take different numbers of cycles.
How do modern CPUs execute instructions out of order?
Out-of-order execution decodes instructions into μops, places them in a reorder buffer (ROB), and dispatches them to functional units when operands are ready (register renaming avoids false dependencies). Results are written to the ROB and committed in program order to maintain precise exceptions. This extracts ILP beyond what in-order execution can achieve.