CampusFlow
ArchitectureMemory Hierarchy

🗂️ Memory Hierarchy

The memory hierarchy organizes storage from fast/expensive (registers) to slow/cheap (tape). Understanding this tradeoff is fundamental to computer architecture.

The Hierarchy Pyramid

ℹ️

Core Principle

Faster memory costs more per bit. The hierarchy exploits locality: keep frequently used data in fast, small memory and the rest in slow, large memory.

Memory Hierarchy: Speed vs Size

Registers
Size: ~1 KB · Speed: 0.3 ns
$$$$$
/bit
L1 Cache
Size: ~128 KB · Speed: 1 ns
$$$$
/bit
L2 Cache
Size: ~1 MB · Speed: 5 ns
$$$
/bit
L3 Cache
Size: ~16 MB · Speed: 15 ns
$$
/bit
RAM
Size: ~32 GB · Speed: 60 ns
$
/bit
SSD
Size: ~2 TB · Speed: 100 µs
¢
/bit
HDD
Size: ~10 TB · Speed: 10 ms
¢
/bit
Tape
Size: ~100 TB · Speed: 60 s
¢
/bit

Locality of Reference

💡

Why Hierarchy Works

Programs spend 90% of their time accessing only 10% of their code/data. The hierarchy exploits this by keeping the hot 10% in fast memory.
🔄

Temporal Locality

Recently accessed data tends to be accessed again soon.

c

for (int i = 0; i < 1000; i++) sum += arr[k];  // arr[k] reused 1000x
↔️

Spatial Locality

Data near recently accessed data is likely to be accessed.

c

for (int i = 0; i < 1000; i++) sum += arr[i];  // sequential access pattern

Access Time Comparison

LevelAccess TimeTypical SizeManaged ByCPU Latency
Register0.3 ns~1 KBCompiler1 cycle
L1 Cache1 ns~128 KBHardware3-4 cycles
L2 Cache5 ns~1 MBHardware10-15 cycles
L3 Cache15 ns~16 MBHardware30-50 cycles
RAM (DRAM)60 ns~32 GBOS + Hardware150-300 cycles
SSD100 µs~2 TBOS~300K cycles
HDD10 ms~10 TBOS~30M cycles
Tape60 s~100 TBOperator~180B cycles
⚠️

Mind the Gap

The speed gap between CPU registers and HDD is 7+ orders of magnitude. A single HDD access costs ~30 million CPU cycles — equivalent to millions of operations.

Interactive: Average Access Time Calculator

ℹ️

Formula

Average Access Time = Hit Time + (Miss Rate × Miss Penalty). A higher hit rate dramatically improves performance.
50%100%
120
10500

Average Memory Access Time

7.00 cycles

Hit: 2 cyclesMiss Rate: 5%Miss: 100 cycles

Reasonable access time. Consider optimizing hit rate.

Code Example: Memory Access Profiler

python

# Simulate memory hierarchy access
class MemoryAccess:
    def __init__(self, hit_time=2, miss_penalty=100):
        self.hit_time = hit_time
        self.miss_penalty = miss_penalty
        self.hits = 0
        self.misses = 0

    def access(self, address):
        """Simulate a memory access"""
        if self._is_cached(address):
            self.hits += 1
            return self.hit_time
        else:
            self.misses += 1
            self._cache_block(address)
            return self.hit_time + self.miss_penalty

    def avg_access_time(self):
        rate = self.hits / (self.hits + self.misses + 1e-9)
        return self.hit_time + (1 - rate) * self.miss_penalty

    def _is_cached(self, addr):
        return hash(addr) % 100 < 90  # 90% hit simulation

Interview Questions

Explain the memory hierarchy and why it exists.

The memory hierarchy exists because no single storage technology can be simultaneously fast, large, and cheap. Registers are fastest but tiny (~KB); tape is largest but ~60s access. The hierarchy arranges memory in levels: faster/smaller near CPU, slower/larger further away. It exploits locality to give the illusion of fast, large storage at reasonable cost.

What is temporal vs spatial locality? Give examples.

Temporal locality: recently accessed data is accessed again soon (e.g., a loop counter variable reused each iteration). Spatial locality: data near recently accessed data is accessed soon (e.g., iterating through an array sequentially). Cache exploits temporal by keeping data in cache, and spatial by loading cache lines (blocks of adjacent data).

How does the memory hierarchy improve system performance?

It exploits locality: the CPU spends 90%+ of time accessing the fastest levels (registers/cache). Cache hit rates of 95-99% mean most accesses are served in 1-15 cycles rather than the 100+ cycles for RAM. This gives the performance of fast memory with the capacity of slow memory at a reasonable cost.

What happens if the hit rate drops from 99% to 90%?

Average access time = hit_time + (miss_rate × miss_penalty). With hit_time=2, miss_penalty=100: at 99% hit rate → 2 + 0.01×100 = 3 cycles. At 90% → 2 + 0.10×100 = 12 cycles (4× slower). This shows why even small miss rate increases have outsized impact.