đ Flynn's Taxonomy
Michael Flynn proposed a classification of computer architectures based on the number of instruction streams and data streams. The four categories define how parallelism is organized in a system.
The Taxonomy
Flynn's Classification
Flynn's 2Ã2 Classification Matrix
SISD
Single Instruction, Single Data
A single processor executes one instruction stream on one data stream. This is the traditional von Neumann architecture.
Examples:
SISD â Single Instruction, Single Data
SISD
One instruction at a time on one data item
A single processor executes one instruction stream, operating on one data stream at a time. This is the classic von Neumann model. Most early computers and simple microcontrollers are SISD.
c
// SISD: one addition at a time
int a[100], b[100], c[100];
for (int i = 0; i < 100; i++) {
c[i] = a[i] + b[i]; // one element per iteration
}SIMD â Single Instruction, Multiple Data
Data Parallelism
Scalar (SISD)
assembly
// One at a time c[0] = a[0] + b[0] c[1] = a[1] + b[1] c[2] = a[2] + b[2] c[3] = a[3] + b[3]
Vector (SIMD)
assembly
// All at once! VADD c, a, b ; c[0..3] = a[0..3] + b[0..3] ; Single instruction, 4 operations
c
// SIMD in C using SSE intrinsics
#include <xmmintrin.h>
void vector_add_sse(float *a, float *b, float *c, int n) {
for (int i = 0; i < n; i += 4) {
__m128 va = _mm_load_ps(&a[i]); // load 4 floats
__m128 vb = _mm_load_ps(&b[i]); // load 4 floats
__m128 vc = _mm_add_ps(va, vb); // 4 adds in 1 instruction
_mm_store_ps(&c[i], vc); // store 4 results
}
}MISD â Multiple Instruction, Single Data
MISD
Different instructions on the same data
Multiple processors execute different instructions on the same data stream. This is the least common category. Practical applications are rare â mainly found in fault-tolerant systems where multiple processors independently process the same input and votes on the result.
MISD in Practice
MIMD â Multiple Instruction, Multiple Data
MIMD
Different instructions on different data
Each processor executes its own instruction stream on its own data. This is the most flexible and most common parallel architecture. Multi-core CPUs, clusters, and distributed systems are all MIMD.
Shared Memory MIMD
All processors share a single address space. Communication via shared variables. Examples: SMP, multi-core CPUs.
Distributed Memory MIMD
Each processor has private memory. Communication via message passing. Examples: clusters, supercomputers.
Code Example: SIMD vs Scalar Performance
c
// Performance comparison
// Scalar vs SIMD vector addition
double start = clock();
for (int i = 0; i < N; i++) {
C[i] = A[i] + B[i]; // Scalar
}
double scalar_time = clock() - start;
start = clock();
for (int i = 0; i < N; i += 4) {
_mm_store_ps(&C[i],
_mm_add_ps(
_mm_load_ps(&A[i]),
_mm_load_ps(&B[i])
)
); // SIMD - 4x less iterations
}
double simd_time = clock() - start;
print(f"Speedup: {scalar_time/simd_time:.1f}x");Interview Questions
What are the four categories of Flynn's Taxonomy?
SISD (Single Instruction, Single Data) â traditional single-core CPU. SIMD (Single Instruction, Multiple Data) â vector processors, GPUs. MISD (Multiple Instruction, Single Data) â fault-tolerant systems. MIMD (Multiple Instruction, Multiple Data) â multi-core CPUs, clusters.
Why is MISD rarely used in practice?
MISD requires multiple processors to execute different instructions on the same data simultaneously. This is inherently inefficient because most algorithms don't benefit from different operations on the same data. The main use case is fault-tolerant systems where redundant processors run the same program on the same input and vote on the result.
What is the difference between SIMD and MIMD?
SIMD executes the same instruction on multiple data elements in lockstep â ideal for data-parallel workloads. MIMD allows each processor to execute different instructions on different data â more flexible but requires more complex synchronization. SIMD is simpler hardware; MIMD is more general-purpose.
How do SIMD extensions like SSE and AVX improve performance?
SIMD extensions add wide registers (128-bit SSE, 256-bit AVX, 512-bit AVX-512) that hold multiple data elements. A single instruction operates on all elements simultaneously. For example, one AVX instruction can do 8 single-precision additions, providing up to 8x speedup for vectorizable code.