🔢 Floating Point Arithmetic
IEEE 754 standard for representing and computing with real numbers in binary. Single precision (32-bit) and double precision (64-bit) formats.
IEEE 754 Single Precision Format
The Standard
32-bit Floating Point Layout
Range: ±1.18×10⁻³⁸ to ±3.4×10³⁸ | Precision: ~7 decimal digits
Interactive IEEE 754 Converter
Special Values
| Value | Hex | Sign | Exp | Mantissa | Meaning |
|---|---|---|---|---|---|
| +0 | 0x00000000 | 0 | 00 | 000...000 | Zero |
| -0 | 0x80000000 | 1 | 00 | 000...000 | Zero |
| +∞ | 0x7F800000 | 0 | FF | 000...000 | Infinity |
| -∞ | 0xFF800000 | 1 | FF | 000...000 | Infinity |
| NaN | 0x7FC00000 | 0/1 | FF | 100...000 | Not a Number |
| Max Normal | 0x7F7FFFFF | 0 | FE | 111...111 | Normal |
| Min Normal | 0x00800000 | 0 | 01 | 000...000 | Normal |
| Denorm Min | 0x00000001 | 0 | 00 | 000...001 | Denormalized |
Special Case Rules
- Zero (E=0, M=0): Signed zero exists (+0 and -0)
- Infinity (E=255, M=0): Represents overflow or division by zero
- NaN (E=255, M≠0): Result of invalid operations (0/0, ∞-∞)
- Denormalized (E=0, M≠0): Gradual underflow; implicit leading 0 instead of 1
Floating Point Addition
Adding two floating-point numbers requires aligning exponents, adding significands, then normalizing the result.
Addition Steps
Align
Shift smaller exponent to match larger
Add
Add significands (with hidden bit)
Normalize
Shift result and adjust exponent
Round
Round to fit 23-bit mantissa
Check
Check for overflow/underflow
Example: 42.5 + 3.75
42.5 = 1.010101 × 2⁵ → E=132(5+127), M=0101010...
3.75 = 1.111 × 2¹ → align: shift right 4 → 0.0001111 × 2⁵
Sum: 1.010101 + 0.0001111 = 1.0111001 × 2⁵
Result: 0 10000100 0111001000... = 46.25 (hex: 0x42390000)
Code Example
python
import struct
def float_to_ieee754(f: float) -> dict:
"""Convert a Python float to IEEE 754 single precision representation."""
bits = struct.pack('>f', f)
binary = ''.join(f'{b:08b}' for b in bits)
hex_str = bits.hex().upper()
sign = int(binary[0])
exp = int(binary[1:9], 2)
mant = binary[9:]
if exp == 0 and int(mant) == 0:
desc = "Zero"
elif exp == 255 and int(mant) == 0:
desc = "Infinity"
elif exp == 255:
desc = "NaN"
elif exp == 0:
desc = f"Denormalized: (-1)^{sign} × 2^{-126} × 0.{mant}"
else:
desc = f"Normal: (-1)^{sign} × 2^{exp-127} × 1.{mant}"
return {
"sign": sign, "exponent": exp, "mantissa": mant,
"binary": binary, "hex": f"0x{hex_str}", "description": desc
}
# Example
print(float_to_ieee754(-3.14))
# Output: sign=1, exponent=128, mantissa=100100011110...Interview Questions
What is the IEEE 754 bias and why is it used?
The exponent in IEEE 754 uses a bias of 127 (single) or 1023 (double) to represent both positive and negative exponents without a sign bit. This allows simpler comparison of floating-point numbers as integer bit patterns. The actual exponent is stored_exponent - bias.
Explain the difference between normalized and denormalized numbers.
Normalized numbers have an implicit leading 1 before the binary point (1.M), while denormalized numbers have an implicit leading 0 (0.M). Denormals allow gradual underflow to zero, filling the gap between the smallest normal number and zero. They trade precision for the ability to represent extremely small values.
What causes floating point precision errors and how can they be mitigated?
Floating point errors occur because not all decimal fractions have exact binary representations (e.g., 0.1). Operations accumulate rounding errors. Mitigation: use double precision, avoid equality comparisons, use epsilon tolerances, consider decimal arithmetic (e.g., Python's Decimal module) for financial calculations.