NumPy

Numerical computing with fast array operations.

Arrays & Shapes

NumPy's ndarray is a homogeneous n-dimensional array. The shape tuple describes its dimensions. Reshaping changes the layout without copying data when possible.

import numpy as np

# Creating arrays
a = np.array([1, 2, 3])
b = np.zeros((2, 3))        # 2x3 matrix of zeros
c = np.ones((3, 2))         # 3x2 matrix of ones
d = np.arange(0, 10, 2)     # [0, 2, 4, 6, 8]
e = np.linspace(0, 1, 5)    # [0.0, 0.25, 0.5, 0.75, 1.0]

# Shape and reshape
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape)            # (2, 3)
flat = arr.reshape(-1)      # [1, 2, 3, 4, 5, 6]
matrix = flat.reshape(2, 3) # back to (2, 3)

Broadcasting

Broadcasting allows arithmetic between arrays of different shapes. NumPy automatically expands dimensions so operations are element-wise without making copies.

import numpy as np

# Broadcasting in action
arr = np.array([[1, 2, 3], [4, 5, 6]])
row = np.array([10, 20, 30])

result = arr + row
print(result)
# [[11 22 33]
#  [14 25 36]]

# Scale entire matrix by scalar
scaled = arr * 2
# [[ 2  4  6]
#  [ 8 10 12]]

# Broadcasting rules:
# 1. Align trailing dimensions
# 2. Dimensions of size 1 are stretched
# 3. If dimensions differ and != 1, error

Array Operations & Linear Algebra

NumPy provides vectorized math operations (element-wise add, multiply, etc.) and a comprehensive linear algebra module for matrix multiplication, decompositions, and eigenvalues.

import numpy as np

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

# Element-wise
print(a + b)
print(a * b)
print(np.sqrt(a))

# Matrix multiplication
print(a @ b)           # Python 3.5+
print(np.dot(a, b))    # Equivalent

# Linear algebra
det = np.linalg.det(a)
inv = np.linalg.inv(a)
eigvals, eigvecs = np.linalg.eig(a)

# Stats on arrays
print(a.mean())        # 2.5
print(a.sum(axis=0))   # [4, 6]
print(a.std())         # 1.118...

Random Number Generation

The random module supports sampling from many probability distributions and is essential for simulations, bootstrapping, and initializing ML model weights.

import numpy as np

rng = np.random.default_rng(42)  # seed for reproducibility

ints = rng.integers(0, 100, 10)   # 10 random ints in [0, 100)
floats = rng.random((3, 3))        # 3x3 uniform [0, 1)
normal = rng.normal(0, 1, 1000)    # 1000 N(0, 1) samples
choice = rng.choice(["A", "B", "C"], size=5, p=[0.2, 0.3, 0.5])
shuffle = rng.permutation(np.arange(10))

print(f"Mean: {normal.mean():.3f}, Std: {normal.std():.3f}")

Performance vs Python Lists

import numpy as np
import time

n = 10_000_000
py_list = list(range(n))
np_arr = np.arange(n)

t0 = time.time()
py_result = [x**2 for x in py_list]
t_py = time.time() - t0

t0 = time.time()
np_result = np_arr ** 2
t_np = time.time() - t0

print(f"Python list: {t_py:.3f}s")
print(f"NumPy array: {t_np:.3f}s")
print(f"Speedup: {t_py / t_np:.1f}x")
# Typical output: NumPy 20-50x faster

Python Lists

✅ Flexible, mixed types

⚠️ Slow loops, high overhead

NumPy Arrays

✅ Homogeneous, contiguous

⚠️ Vectorized, C-speed

Interview Questions

Q: What is broadcasting in NumPy?

Broadcasting allows arithmetic between arrays of different shapes by automatically expanding smaller arrays to match the larger one's shape. Rules: align trailing dimensions, stretch size-1 dimensions, error on mismatch.

Q: How is a NumPy array different from a Python list?

NumPy arrays are homogeneous, stored in contiguous memory, and support vectorized operations. Python lists store pointers to objects and are slower for numerical operations. NumPy arrays also have a fixed size.

Q: Explain the difference between np.dot, np.matmul, and the @ operator.

All three perform matrix multiplication. For 2-D arrays they are equivalent. `np.dot` also handles 1-D dot products. The `@` operator (Python 3.5+) calls `np.matmul` which is preferred for clarity.

Q: How do you handle missing data in NumPy?

NumPy uses `np.nan` for missing values. Functions like `np.nansum`, `np.nanmean`, and `np.nanstd` ignore NaN values. Use `np.isnan(arr)` to detect them and boolean indexing or `np.where` to handle them.