🗂️ Memory Hierarchy
The memory hierarchy organizes storage from fast/expensive (registers) to slow/cheap (tape). Understanding this tradeoff is fundamental to computer architecture.
The Hierarchy Pyramid
Core Principle
Memory Hierarchy: Speed vs Size
Locality of Reference
Why Hierarchy Works
Temporal Locality
Recently accessed data tends to be accessed again soon.
c
for (int i = 0; i < 1000; i++) sum += arr[k]; // arr[k] reused 1000x
Spatial Locality
Data near recently accessed data is likely to be accessed.
c
for (int i = 0; i < 1000; i++) sum += arr[i]; // sequential access pattern
Access Time Comparison
| Level | Access Time | Typical Size | Managed By | CPU Latency |
|---|---|---|---|---|
| Register | 0.3 ns | ~1 KB | Compiler | 1 cycle |
| L1 Cache | 1 ns | ~128 KB | Hardware | 3-4 cycles |
| L2 Cache | 5 ns | ~1 MB | Hardware | 10-15 cycles |
| L3 Cache | 15 ns | ~16 MB | Hardware | 30-50 cycles |
| RAM (DRAM) | 60 ns | ~32 GB | OS + Hardware | 150-300 cycles |
| SSD | 100 µs | ~2 TB | OS | ~300K cycles |
| HDD | 10 ms | ~10 TB | OS | ~30M cycles |
| Tape | 60 s | ~100 TB | Operator | ~180B cycles |
Mind the Gap
Interactive: Average Access Time Calculator
Formula
Average Memory Access Time
7.00 cycles
Reasonable access time. Consider optimizing hit rate.
Code Example: Memory Access Profiler
python
# Simulate memory hierarchy access
class MemoryAccess:
def __init__(self, hit_time=2, miss_penalty=100):
self.hit_time = hit_time
self.miss_penalty = miss_penalty
self.hits = 0
self.misses = 0
def access(self, address):
"""Simulate a memory access"""
if self._is_cached(address):
self.hits += 1
return self.hit_time
else:
self.misses += 1
self._cache_block(address)
return self.hit_time + self.miss_penalty
def avg_access_time(self):
rate = self.hits / (self.hits + self.misses + 1e-9)
return self.hit_time + (1 - rate) * self.miss_penalty
def _is_cached(self, addr):
return hash(addr) % 100 < 90 # 90% hit simulationInterview Questions
Explain the memory hierarchy and why it exists.
The memory hierarchy exists because no single storage technology can be simultaneously fast, large, and cheap. Registers are fastest but tiny (~KB); tape is largest but ~60s access. The hierarchy arranges memory in levels: faster/smaller near CPU, slower/larger further away. It exploits locality to give the illusion of fast, large storage at reasonable cost.
What is temporal vs spatial locality? Give examples.
Temporal locality: recently accessed data is accessed again soon (e.g., a loop counter variable reused each iteration). Spatial locality: data near recently accessed data is accessed soon (e.g., iterating through an array sequentially). Cache exploits temporal by keeping data in cache, and spatial by loading cache lines (blocks of adjacent data).
How does the memory hierarchy improve system performance?
It exploits locality: the CPU spends 90%+ of time accessing the fastest levels (registers/cache). Cache hit rates of 95-99% mean most accesses are served in 1-15 cycles rather than the 100+ cycles for RAM. This gives the performance of fast memory with the capacity of slow memory at a reasonable cost.
What happens if the hit rate drops from 99% to 90%?
Average access time = hit_time + (miss_rate × miss_penalty). With hit_time=2, miss_penalty=100: at 99% hit rate → 2 + 0.01×100 = 3 cycles. At 90% → 2 + 0.10×100 = 12 cycles (4× slower). This shows why even small miss rate increases have outsized impact.