⚙️ Multiprocessors
Systems with multiple processors working together. Explore tight vs loose coupling, SMP, NUMA, Amdahl's Law, and parallel performance analysis.
Tightly vs Loosely Coupled Systems
Symmetric Multiprocessing (SMP)
SMP Architecture
SMP System
Non-Uniform Memory Access (NUMA)
NUMA
Local Access
Memory attached to the same processor node. Fast (e.g., 100ns latency). No interconnect overhead.
Remote Access
Memory attached to another processor node. Slower (e.g., 200-300ns). Traverses the interconnect.
c
// NUMA-aware programming (Linux)
// libnuma: bind memory to specific nodes
#include <numa.h>
void numa_example() {
int node = 0; // NUMA node
void *mem = numa_alloc_local(1024 * 1024);
// Memory allocated on local node - fast access
// Bind thread to specific CPU
struct bitmask *mask = numa_allocate_cpumask();
numa_bitmask_setbit(mask, 0); // CPU 0
numa_sched_setaffinity(0, mask);
}Amdahl's Law
Amdahl's Law
P = serial fraction, N = number of processors
Key Insight
| Serial % | 2 cores | 4 cores | 8 cores | 16 cores | ∞ cores |
|---|---|---|---|---|---|
| 1% | 1.98x | 3.88x | 7.48x | 13.91x | 100.0x |
| 5% | 1.90x | 3.48x | 5.93x | 9.14x | 20.0x |
| 10% | 1.82x | 3.08x | 4.71x | 6.40x | 10.0x |
| 25% | 1.60x | 2.29x | 2.91x | 3.37x | 4.0x |
| 50% | 1.33x | 1.60x | 1.78x | 1.88x | 2.0x |
Parallel Speedup & Efficiency
Key Metrics
Linear Speedup
Sub-linear
Super-linear
Code Example: OpenMP Parallelism
c
// OpenMP: shared-memory multiprocessing
#include <omp.h>
#include <stdio.h>
int main() {
int n = 1000000;
double sum = 0.0;
double *arr = malloc(n * sizeof(double));
// Initialize
for (int i = 0; i < n; i++) arr[i] = 1.0;
// Parallel reduction with 4 threads
#pragma omp parallel for num_threads(4) reduction(+:sum)
for (int i = 0; i < n; i++) {
sum += arr[i]; // Each thread handles 250K elements
}
printf("Sum = %f\n", sum);
printf("Threads: %d, Speedup: ~%dx\n",
omp_get_max_threads(), omp_get_max_threads());
return 0;
}Multiprocessor Challenges
Cache Coherence
Multiple caches may hold stale copies of shared data. Solution: snooping protocols (MESI) or directory-based protocols.
Memory Consistency
Order in which memory operations appear to different processors. Models: Sequential Consistency, Release Consistency.
Synchronization
Locks, semaphores, barriers needed to coordinate access. Can become a bottleneck at scale.
Load Balancing
Uneven work distribution leaves some processors idle. Dynamic scheduling helps but adds overhead.
Interview Questions
What is the difference between tightly coupled and loosely coupled multiprocessors?
Tightly coupled systems share a single memory accessed equally by all processors via a bus/crossbar (SMP). Fast communication but limited scalability. Loosely coupled systems have private memory per processor, communicating via message passing (clusters). High scalability but slower communication and explicit programming required.
Explain Amdahl's Law and its implications.
Amdahl's Law: Speedup = 1/(P + (1-P)/N) where P is the serial fraction. As N → ∞, max speedup → 1/P. Even 10% serial code limits speedup to 10x regardless of cores. The implication: identify and optimize the serial bottleneck before adding more processors.
What is SMP and how does it differ from NUMA?
SMP (Symmetric Multiprocessing): all processors share a single memory with uniform access latency via a shared bus. NUMA (Non-Uniform Memory Access): each processor has local memory (fast) and can access remote memory (slower). SMP is simpler but doesn't scale beyond 8-16 processors. NUMA scales to hundreds.
What factors cause sub-linear speedup in parallel systems?
Sub-linear speedup is caused by: (1) Serial sections — Amdahl's Law limits apply. (2) Communication overhead — processors spend time sending/receiving data. (3) Synchronization overhead — locks and barriers add waiting time. (4) Load imbalance — some processors finish early and idle. (5) Resource contention — bus/memory conflicts increase with more processors.