📈 Performance Metrics
Understand how computer performance is measured: clock rate, CPI, MIPS, Amdahl's Law, and benchmarking methodologies.
Key Performance Metrics
What Determines Performance?
Clock Rate (f)
Number of clock cycles per second (GHz). Higher clock rate = potentially faster execution, but also more power and heat.
CPI (Cycles Per Instruction)
Average number of clock cycles needed to execute one instruction. Lower CPI = better performance.
Instruction Count (IC)
Total number of instructions executed by a program. Determined by program, compiler, and ISA.
MIPS / MFLOPS
MIPS = millions of instructions per second. MFLOPS = millions of floating-point operations per second.
CPU Time Formula
The Fundamental Equation
Reduce IC
Better compilers, optimized algorithms, efficient ISA
Reduce CPI
Pipeline optimization, superscalar execution, cache hits
Increase Clock Rate
Smaller transistors, deeper pipelines, better cooling
Interactive CPI Calculator
Try It Yourself
Amdahl's Law
Amdahl's Law
If only 25% of code can be improved, max speedup is 1.33× regardless of S.
Even if you make the improved portion infinitely fast, total speedup caps at 2×.
Three-quarters must be parallelizable to achieve 4× speedup.
With 90% parallelization, theoretical max speedup is 10×.
Interactive Amdahl's Law Calculator
Visualize Speedup
MIPS, MFLOPS, and GFLOPS
MIPS
Million Instructions Per Second
Depends on instruction mix. RISC CPUs often have higher MIPS but not always faster.
MFLOPS
Million Floating-Point Ops / Sec
Better for scientific computing. Does not account for precision differences.
GFLOPS
Giga Floating-Point Ops / Sec
Modern GPUs achieve TFLOPS. Peak vs sustained GFLOPS differ significantly.
Benchmarking
Why Benchmark?
SPEC (Standard Performance Evaluation Corp)
Suite of CPU-intensive benchmarks (SPECint, SPECfp). Measures performance under realistic workloads. Results reported as SPECratio (normalized to reference machine).
Dhrystone
Synthetic benchmark focused on integer operations. Measures MIPS (DMIPS). Criticized for being too small to fit in modern caches, inflating results.
Whetstone
Synthetic benchmark for floating-point performance. Includes trigonometric functions, array operations, and conditionals. Used historically for MFLOPS ratings.
Linpack
Solves dense systems of linear equations. Core benchmark for TOP500 supercomputer ranking. Measures GFLOPS for matrix operations.
Code Example: Performance Calculation
python
# Calculate CPU performance metrics
# Given:
instruction_count = 1_000_000_000 # 1 billion instructions
avg_cpi = 2.5 # cycles per instruction
clock_rate = 3.0e9 # 3.0 GHz
# CPU Time = IC × CPI × Cycle Time
cycle_time = 1.0 / clock_rate # seconds per cycle
cpu_time = instruction_count * avg_cpi * cycle_time
print(f"CPU Time: {cpu_time:.4f} seconds")
# MIPS = IC / (CPU Time * 10^6)
mips = instruction_count / (cpu_time * 1e6)
print(f"MIPS: {mips:.2f}")
# Amdahl's Law
P = 0.75 # 75% of code is parallelizable
S = 8 # 8x speedup on parallel portion
speedup = 1 / ((1 - P) + P / S)
print(f"Amdahl Speedup ({P*100}% parallel, {S}x speedup): {speedup:.2f}x")Interview Questions
Explain the CPU performance equation and how to improve performance.
CPU Time = IC × CPI × Clock Cycle Time. Performance can be improved by: 1) Reducing IC (better compilers, algorithms), 2) Reducing CPI (pipelining, caching, branch prediction), 3) Increasing clock rate (smaller transistors, better cooling). Each approach has trade-offs — higher clock rate increases power consumption and heat.
What is the significance of Amdahl's Law in parallel computing?
Amdahl's Law shows that the speedup from parallelization is fundamentally limited by the sequential portion of a program. If 10% of a task is sequential, the maximum speedup is 10× regardless of how many processors are added. This motivates focusing on reducing sequential bottlenecks, not just adding more parallel hardware.
Why is MIPS considered a flawed performance metric?
MIPS varies with instruction mix — a CPU might achieve high MIPS on one program but lower on another due to different instruction types. Also, RISC CPUs often have higher MIPS than CISC CPUs but may need more instructions per program. MIPS can be misleading when comparing across different ISAs. SPEC benchmarks provide more reliable comparisons.
What is the difference between peak and sustained performance?
Peak performance is the theoretical maximum throughput under ideal conditions (e.g., all functional units active, no cache misses, perfect parallelism). Sustained performance is what a system actually delivers under real workloads with memory latency, branch mispredictions, and resource contention. The ratio (sustained/peak) is the efficiency, typically 30-70% for most systems.