CampusFlow
ArchitectureFloating Point Arithmetic

🔢 Floating Point Arithmetic

IEEE 754 standard for representing and computing with real numbers in binary. Single precision (32-bit) and double precision (64-bit) formats.

IEEE 754 Single Precision Format

ℹ️

The Standard

IEEE 754 is the universal standard for floating-point computation. Single precision uses 32 bits: 1 sign bit, 8 exponent bits (biased by 127), and 23 mantissa bits (with implicit leading 1 for normalized numbers).

32-bit Floating Point Layout

S
1 bit
bit 31
Exponent
8 bits
bits 30-23
Mantissa / Fraction
23 bits
bits 22-0
Value = (-1)^S × 2^(E-127) × 1.M  |  Bias = 127
Range: ±1.18×10⁻³⁸ to ±3.4×10³⁸  |  Precision: ~7 decimal digits

Interactive IEEE 754 Converter

Full 32-bit Binary
01000010001010100000000000000000
SignExponentMantissa
Hex
0x422A0000
Sign
0
Exponent
132
Bias Adj
5
Normalized Representation
(-1)^0 × 2^(5) × 1.01010100...

Special Values

ValueHexSignExpMantissaMeaning
+00x00000000000000...000Zero
-00x80000000100000...000Zero
+∞0x7F8000000FF000...000Infinity
-∞0xFF8000001FF000...000Infinity
NaN0x7FC000000/1FF100...000Not a Number
Max Normal0x7F7FFFFF0FE111...111Normal
Min Normal0x00800000001000...000Normal
Denorm Min0x00000001000000...001Denormalized
⚠️

Special Case Rules

  • Zero (E=0, M=0): Signed zero exists (+0 and -0)
  • Infinity (E=255, M=0): Represents overflow or division by zero
  • NaN (E=255, M≠0): Result of invalid operations (0/0, ∞-∞)
  • Denormalized (E=0, M≠0): Gradual underflow; implicit leading 0 instead of 1

Floating Point Addition

Adding two floating-point numbers requires aligning exponents, adding significands, then normalizing the result.

Addition Steps

1

Align

Shift smaller exponent to match larger

2

Add

Add significands (with hidden bit)

3

Normalize

Shift result and adjust exponent

4

Round

Round to fit 23-bit mantissa

5

Check

Check for overflow/underflow

Example: 42.5 + 3.75

42.5 = 1.010101 × 2⁵ → E=132(5+127), M=0101010...

3.75 = 1.111 × 2¹ → align: shift right 4 → 0.0001111 × 2⁵

Sum: 1.010101 + 0.0001111 = 1.0111001 × 2⁵

Result: 0 10000100 0111001000... = 46.25 (hex: 0x42390000)

Code Example

python

import struct

def float_to_ieee754(f: float) -> dict:
    """Convert a Python float to IEEE 754 single precision representation."""
    bits = struct.pack('>f', f)
    binary = ''.join(f'{b:08b}' for b in bits)
    hex_str = bits.hex().upper()
    
    sign = int(binary[0])
    exp = int(binary[1:9], 2)
    mant = binary[9:]
    
    if exp == 0 and int(mant) == 0:
        desc = "Zero"
    elif exp == 255 and int(mant) == 0:
        desc = "Infinity"
    elif exp == 255:
        desc = "NaN"
    elif exp == 0:
        desc = f"Denormalized: (-1)^{sign} × 2^{-126} × 0.{mant}"
    else:
        desc = f"Normal: (-1)^{sign} × 2^{exp-127} × 1.{mant}"
    
    return {
        "sign": sign, "exponent": exp, "mantissa": mant,
        "binary": binary, "hex": f"0x{hex_str}", "description": desc
    }

# Example
print(float_to_ieee754(-3.14))
# Output: sign=1, exponent=128, mantissa=100100011110...

Interview Questions

What is the IEEE 754 bias and why is it used?

The exponent in IEEE 754 uses a bias of 127 (single) or 1023 (double) to represent both positive and negative exponents without a sign bit. This allows simpler comparison of floating-point numbers as integer bit patterns. The actual exponent is stored_exponent - bias.

Explain the difference between normalized and denormalized numbers.

Normalized numbers have an implicit leading 1 before the binary point (1.M), while denormalized numbers have an implicit leading 0 (0.M). Denormals allow gradual underflow to zero, filling the gap between the smallest normal number and zero. They trade precision for the ability to represent extremely small values.

What causes floating point precision errors and how can they be mitigated?

Floating point errors occur because not all decimal fractions have exact binary representations (e.g., 0.1). Operations accumulate rounding errors. Mitigation: use double precision, avoid equality comparisons, use epsilon tolerances, consider decimal arithmetic (e.g., Python's Decimal module) for financial calculations.