📈 Performance Metrics

Understand how computer performance is measured: clock rate, CPI, MIPS, Amdahl's Law, and benchmarking methodologies.

Key Performance Metrics

ℹ️

What Determines Performance?

CPU performance depends on three factors: instruction count (IC), cycles per instruction (CPI), and clock cycle time. The fundamental equation: CPU Time = IC × CPI × Clock Cycle Time.

Clock Rate (f)

Number of clock cycles per second (GHz). Higher clock rate = potentially faster execution, but also more power and heat.

CPI (Cycles Per Instruction)

Average number of clock cycles needed to execute one instruction. Lower CPI = better performance.

Instruction Count (IC)

Total number of instructions executed by a program. Determined by program, compiler, and ISA.

MIPS / MFLOPS

MIPS = millions of instructions per second. MFLOPS = millions of floating-point operations per second.

CPU Time Formula

💡

The Fundamental Equation

CPU Time = IC × CPI × Clock Cycle Time (where Clock Cycle Time = 1 / Clock Rate). This equation shows that performance improvements can come from reducing any of the three factors.

CPU Time = IC × CPI × (1 / f)

Reduce IC

Better compilers, optimized algorithms, efficient ISA

Reduce CPI

Pipeline optimization, superscalar execution, cache hits

Increase Clock Rate

Smaller transistors, deeper pipelines, better cooling

Interactive CPI Calculator

💡

Try It Yourself

Enter instruction count, average CPI, and clock rate to compute CPU time and MIPS.

Instruction Count (IC)

Avg CPI

Clock Rate (GHz)

Total Cycles

2,500,000

CPU Time

0.001250 s

MIPS

800.00

Amdahl's Law

ℹ️

Amdahl's Law

Amdahl's Law states that the speedup of a system is limited by the fraction of the task that cannot be parallelized or improved. Speedup = 1 / ((1 - P) + P/S), where P is the proportion that can be improved, and S is the speedup factor of that portion.

Speedup = 1 / ((1 - P) + P / S)

25% parallelmax 1.33×

If only 25% of code can be improved, max speedup is 1.33× regardless of S.

50% parallelmax 2×

Even if you make the improved portion infinitely fast, total speedup caps at 2×.

75% parallelmax 4×

Three-quarters must be parallelizable to achieve 4× speedup.

90% parallelmax 10×

With 90% parallelization, theoretical max speedup is 10×.

Interactive Amdahl's Law Calculator

💡

Visualize Speedup

Adjust the parallelizable fraction (P) and speedup factor (S) to see the overall system speedup.

Parallelizable Fraction (P): 0.6

0%50%99%

Speedup Factor (S): 4×

1×32×64×

Parallelizable Portion

60%

Overall Speedup

1.82×

Execution Time

Normalized: 55.0%

MIPS, MFLOPS, and GFLOPS

MIPS

Million Instructions Per Second

MIPS = IC / (CPU Time × 10⁶)

Depends on instruction mix. RISC CPUs often have higher MIPS but not always faster.

MFLOPS

Million Floating-Point Ops / Sec

MFLOPS = FP Ops / (Time × 10⁶)

Better for scientific computing. Does not account for precision differences.

GFLOPS

Giga Floating-Point Ops / Sec

GFLOPS = FP Ops / (Time × 10⁹)

Modern GPUs achieve TFLOPS. Peak vs sustained GFLOPS differ significantly.

Benchmarking

ℹ️

Why Benchmark?

Benchmarks are standardized programs used to compare computer performance. Good benchmarks reflect real-world workloads and produce reproducible, comparable results across different systems.

SPEC (Standard Performance Evaluation Corp)

Suite of CPU-intensive benchmarks (SPECint, SPECfp). Measures performance under realistic workloads. Results reported as SPECratio (normalized to reference machine).

Dhrystone

Synthetic benchmark focused on integer operations. Measures MIPS (DMIPS). Criticized for being too small to fit in modern caches, inflating results.

Whetstone

Synthetic benchmark for floating-point performance. Includes trigonometric functions, array operations, and conditionals. Used historically for MFLOPS ratings.

Linpack

Solves dense systems of linear equations. Core benchmark for TOP500 supercomputer ranking. Measures GFLOPS for matrix operations.

Code Example: Performance Calculation

python

# Calculate CPU performance metrics

# Given:
instruction_count = 1_000_000_000   # 1 billion instructions
avg_cpi = 2.5                       # cycles per instruction
clock_rate = 3.0e9                  # 3.0 GHz

# CPU Time = IC × CPI × Cycle Time
cycle_time = 1.0 / clock_rate       # seconds per cycle
cpu_time = instruction_count * avg_cpi * cycle_time

print(f"CPU Time: {cpu_time:.4f} seconds")

# MIPS = IC / (CPU Time * 10^6)
mips = instruction_count / (cpu_time * 1e6)
print(f"MIPS: {mips:.2f}")

# Amdahl's Law
P = 0.75  # 75% of code is parallelizable
S = 8     # 8x speedup on parallel portion
speedup = 1 / ((1 - P) + P / S)
print(f"Amdahl Speedup ({P*100}% parallel, {S}x speedup): {speedup:.2f}x")

Interview Questions

Explain the CPU performance equation and how to improve performance.

CPU Time = IC × CPI × Clock Cycle Time. Performance can be improved by: 1) Reducing IC (better compilers, algorithms), 2) Reducing CPI (pipelining, caching, branch prediction), 3) Increasing clock rate (smaller transistors, better cooling). Each approach has trade-offs — higher clock rate increases power consumption and heat.

What is the significance of Amdahl's Law in parallel computing?

Amdahl's Law shows that the speedup from parallelization is fundamentally limited by the sequential portion of a program. If 10% of a task is sequential, the maximum speedup is 10× regardless of how many processors are added. This motivates focusing on reducing sequential bottlenecks, not just adding more parallel hardware.

Why is MIPS considered a flawed performance metric?

MIPS varies with instruction mix — a CPU might achieve high MIPS on one program but lower on another due to different instruction types. Also, RISC CPUs often have higher MIPS than CISC CPUs but may need more instructions per program. MIPS can be misleading when comparing across different ISAs. SPEC benchmarks provide more reliable comparisons.

What is the difference between peak and sustained performance?

Peak performance is the theoretical maximum throughput under ideal conditions (e.g., all functional units active, no cache misses, perfect parallelism). Sustained performance is what a system actually delivers under real workloads with memory latency, branch mispredictions, and resource contention. The ratio (sustained/peak) is the efficiency, typically 30-70% for most systems.