ArchitectureMemory Hierarchy

🗂️ Memory Hierarchy

The memory hierarchy organizes storage from fast/expensive (registers) to slow/cheap (tape). Understanding this tradeoff is fundamental to computer architecture.

The Hierarchy Pyramid

ℹ️

Core Principle

Faster memory costs more per bit. The hierarchy exploits locality: keep frequently used data in fast, small memory and the rest in slow, large memory.

Memory Hierarchy: Speed vs Size

Registers

Size: ~1 KB · Speed: 0.3 ns

$$$$$

/bit

L1 Cache

Size: ~128 KB · Speed: 1 ns

$$$$

/bit

L2 Cache

Size: ~1 MB · Speed: 5 ns

$$$

/bit

L3 Cache

Size: ~16 MB · Speed: 15 ns

/bit

RAM

Size: ~32 GB · Speed: 60 ns

/bit

SSD

Size: ~2 TB · Speed: 100 µs

/bit

HDD

Size: ~10 TB · Speed: 10 ms

/bit

Tape

Size: ~100 TB · Speed: 60 s

/bit

Locality of Reference

💡

Why Hierarchy Works

Programs spend 90% of their time accessing only 10% of their code/data. The hierarchy exploits this by keeping the hot 10% in fast memory.

🔄

Temporal Locality

Recently accessed data tends to be accessed again soon.

for (int i = 0; i < 1000; i++) sum += arr[k];  // arr[k] reused 1000x

↔️

Spatial Locality

Data near recently accessed data is likely to be accessed.

for (int i = 0; i < 1000; i++) sum += arr[i];  // sequential access pattern

Access Time Comparison

Level	Access Time	Typical Size	Managed By	CPU Latency
Register	0.3 ns	~1 KB	Compiler	1 cycle
L1 Cache	1 ns	~128 KB	Hardware	3-4 cycles
L2 Cache	5 ns	~1 MB	Hardware	10-15 cycles
L3 Cache	15 ns	~16 MB	Hardware	30-50 cycles
RAM (DRAM)	60 ns	~32 GB	OS + Hardware	150-300 cycles
SSD	100 µs	~2 TB	OS	~300K cycles
HDD	10 ms	~10 TB	OS	~30M cycles
Tape	60 s	~100 TB	Operator	~180B cycles

⚠️

Mind the Gap

The speed gap between CPU registers and HDD is 7+ orders of magnitude. A single HDD access costs ~30 million CPU cycles — equivalent to millions of operations.

Interactive: Average Access Time Calculator

ℹ️

Formula

Average Access Time = Hit Time + (Miss Rate × Miss Penalty). A higher hit rate dramatically improves performance.

Hit Rate: 95%

50%100%

Hit Time: 2 cycles

120

Miss Penalty: 100 cycles

10500

Average Memory Access Time

7.00 cycles

Hit: 2 cyclesMiss Rate: 5%Miss: 100 cycles

Reasonable access time. Consider optimizing hit rate.

Code Example: Memory Access Profiler

python

# Simulate memory hierarchy access
class MemoryAccess:
    def __init__(self, hit_time=2, miss_penalty=100):
        self.hit_time = hit_time
        self.miss_penalty = miss_penalty
        self.hits = 0
        self.misses = 0

    def access(self, address):
        """Simulate a memory access"""
        if self._is_cached(address):
            self.hits += 1
            return self.hit_time
        else:
            self.misses += 1
            self._cache_block(address)
            return self.hit_time + self.miss_penalty

    def avg_access_time(self):
        rate = self.hits / (self.hits + self.misses + 1e-9)
        return self.hit_time + (1 - rate) * self.miss_penalty

    def _is_cached(self, addr):
        return hash(addr) % 100 < 90  # 90% hit simulation

Interview Questions

Explain the memory hierarchy and why it exists.

The memory hierarchy exists because no single storage technology can be simultaneously fast, large, and cheap. Registers are fastest but tiny (~KB); tape is largest but ~60s access. The hierarchy arranges memory in levels: faster/smaller near CPU, slower/larger further away. It exploits locality to give the illusion of fast, large storage at reasonable cost.

What is temporal vs spatial locality? Give examples.

Temporal locality: recently accessed data is accessed again soon (e.g., a loop counter variable reused each iteration). Spatial locality: data near recently accessed data is accessed soon (e.g., iterating through an array sequentially). Cache exploits temporal by keeping data in cache, and spatial by loading cache lines (blocks of adjacent data).

How does the memory hierarchy improve system performance?

It exploits locality: the CPU spends 90%+ of time accessing the fastest levels (registers/cache). Cache hit rates of 95-99% mean most accesses are served in 1-15 cycles rather than the 100+ cycles for RAM. This gives the performance of fast memory with the capacity of slow memory at a reasonable cost.

What happens if the hit rate drops from 99% to 90%?

Average access time = hit_time + (miss_rate × miss_penalty). With hit_time=2, miss_penalty=100: at 99% hit rate → 2 + 0.01×100 = 3 cycles. At 90% → 2 + 0.10×100 = 12 cycles (4× slower). This shows why even small miss rate increases have outsized impact.