NumPy
Numerical computing with fast array operations.
Arrays & Shapes
NumPy's ndarray is a homogeneous n-dimensional array. The shape tuple describes its dimensions. Reshaping changes the layout without copying data when possible.
import numpy as np # Creating arrays a = np.array([1, 2, 3]) b = np.zeros((2, 3)) # 2x3 matrix of zeros c = np.ones((3, 2)) # 3x2 matrix of ones d = np.arange(0, 10, 2) # [0, 2, 4, 6, 8] e = np.linspace(0, 1, 5) # [0.0, 0.25, 0.5, 0.75, 1.0] # Shape and reshape arr = np.array([[1, 2, 3], [4, 5, 6]]) print(arr.shape) # (2, 3) flat = arr.reshape(-1) # [1, 2, 3, 4, 5, 6] matrix = flat.reshape(2, 3) # back to (2, 3)
Broadcasting
Broadcasting allows arithmetic between arrays of different shapes. NumPy automatically expands dimensions so operations are element-wise without making copies.
import numpy as np # Broadcasting in action arr = np.array([[1, 2, 3], [4, 5, 6]]) row = np.array([10, 20, 30]) result = arr + row print(result) # [[11 22 33] # [14 25 36]] # Scale entire matrix by scalar scaled = arr * 2 # [[ 2 4 6] # [ 8 10 12]] # Broadcasting rules: # 1. Align trailing dimensions # 2. Dimensions of size 1 are stretched # 3. If dimensions differ and != 1, error
Array Operations & Linear Algebra
NumPy provides vectorized math operations (element-wise add, multiply, etc.) and a comprehensive linear algebra module for matrix multiplication, decompositions, and eigenvalues.
import numpy as np a = np.array([[1, 2], [3, 4]]) b = np.array([[5, 6], [7, 8]]) # Element-wise print(a + b) print(a * b) print(np.sqrt(a)) # Matrix multiplication print(a @ b) # Python 3.5+ print(np.dot(a, b)) # Equivalent # Linear algebra det = np.linalg.det(a) inv = np.linalg.inv(a) eigvals, eigvecs = np.linalg.eig(a) # Stats on arrays print(a.mean()) # 2.5 print(a.sum(axis=0)) # [4, 6] print(a.std()) # 1.118...
Random Number Generation
The random module supports sampling from many probability distributions and is essential for simulations, bootstrapping, and initializing ML model weights.
import numpy as np
rng = np.random.default_rng(42) # seed for reproducibility
ints = rng.integers(0, 100, 10) # 10 random ints in [0, 100)
floats = rng.random((3, 3)) # 3x3 uniform [0, 1)
normal = rng.normal(0, 1, 1000) # 1000 N(0, 1) samples
choice = rng.choice(["A", "B", "C"], size=5, p=[0.2, 0.3, 0.5])
shuffle = rng.permutation(np.arange(10))
print(f"Mean: {normal.mean():.3f}, Std: {normal.std():.3f}")Performance vs Python Lists
import numpy as np
import time
n = 10_000_000
py_list = list(range(n))
np_arr = np.arange(n)
t0 = time.time()
py_result = [x**2 for x in py_list]
t_py = time.time() - t0
t0 = time.time()
np_result = np_arr ** 2
t_np = time.time() - t0
print(f"Python list: {t_py:.3f}s")
print(f"NumPy array: {t_np:.3f}s")
print(f"Speedup: {t_py / t_np:.1f}x")
# Typical output: NumPy 20-50x fasterPython Lists
✅ Flexible, mixed types
⚠️ Slow loops, high overhead
NumPy Arrays
✅ Homogeneous, contiguous
⚠️ Vectorized, C-speed
Interview Questions
Q: What is broadcasting in NumPy?
Broadcasting allows arithmetic between arrays of different shapes by automatically expanding smaller arrays to match the larger one's shape. Rules: align trailing dimensions, stretch size-1 dimensions, error on mismatch.
Q: How is a NumPy array different from a Python list?
NumPy arrays are homogeneous, stored in contiguous memory, and support vectorized operations. Python lists store pointers to objects and are slower for numerical operations. NumPy arrays also have a fixed size.
Q: Explain the difference between np.dot, np.matmul, and the @ operator.
All three perform matrix multiplication. For 2-D arrays they are equivalent. `np.dot` also handles 1-D dot products. The `@` operator (Python 3.5+) calls `np.matmul` which is preferred for clarity.
Q: How do you handle missing data in NumPy?
NumPy uses `np.nan` for missing values. Functions like `np.nansum`, `np.nanmean`, and `np.nanstd` ignore NaN values. Use `np.isnan(arr)` to detect them and boolean indexing or `np.where` to handle them.