CampusFlow

Context-Free Grammar & PDA

Context-free languages, parse trees, and pushdown automata

CFG Definition

A context-free grammar is a 4-tuple G = (V, Σ, R, S) where:

  • V — Finite set of non-terminal symbols (variables)
  • Σ — Finite set of terminal symbols (alphabet)
  • R — Finite set of production rules A → α
  • S — Start symbol (S ∈ V)

Examples

Pushdown Automata (PDA)

A PDA is a finite automaton augmented with a stack. It consists of a 7-tuple (Q, Σ, Γ, δ, q₀, Z₀, F) where the transition function δ reads input and stack top to determine the next state and stack operation. PDAs recognize exactly the context-free languages, and every CFG can be converted to an equivalent PDA (and vice versa).

δ(q₁, a, X) = (q₂, γ) means:
In state q₁, reading input 'a' with stack top X,
transition to q₂ and replace X with γ.

Chomsky Normal Form (CNF)

A CFG is in Chomsky Normal Form if every production is of the form:

A → BC   (two non-terminals)
A → a      (single terminal)

Conversion steps to CNF:

  1. Eliminate ε-productions (A → ε)
  2. Eliminate unit productions (A → B)
  3. Replace terminals in mixed rules: A → aB becomes A → Xₐ, Xₐ → a
  4. Break long productions: A → BCD becomes A → BZ, Z → CD

Derivation Steps

A derivation replaces non-terminals using production rules until only terminals remain. Common derivation strategies:

Leftmost Derivation

Always replace the leftmost non-terminal first

Rightmost Derivation

Always replace the rightmost non-terminal first

A grammar is ambiguous if a string has more than one parse tree (equivalently, more than one leftmost derivation).

Key Equivalence

CFGs and PDAs are equivalent in expressive power: for every CFG there exists a PDA that accepts exactly the strings generated by the grammar, and for every PDA there exists a CFG that generates the language accepted by the PDA. This is a foundational result in formal language theory.

Interview Questions

Q1: What is a context-free grammar and how does it relate to pushdown automata?

A CFG is a 4-tuple (V, Σ, R, S) where V is non-terminals, Σ is terminals, R is production rules, and S is the start symbol. CFGs generate context-free languages, which are exactly the languages recognized by pushdown automata (PDAs). This equivalence means that for every CFG there exists a PDA that recognizes the same language, and vice versa.

Q2: Explain Chomsky Normal Form (CNF) and why it's useful.

A CFG is in CNF if every production is of the form A → BC or A → a (where A, B, C are non-terminals and a is a terminal). CNF is useful because: (1) it limits the parse tree structure making analysis easier, (2) the CYK parsing algorithm requires CNF, (3) it provides an upper bound on derivation length — a string of length n requires exactly 2n-1 derivation steps in any parse tree.

Q3: What is the difference between a parse tree and a derivation?

A derivation is a sequence of rule applications that transforms the start symbol into a terminal string. A parse tree is a graphical representation of a derivation where each internal node is a non-terminal, each leaf is a terminal, and children of a node correspond to the right-hand side of a production. A single parse tree can correspond to multiple derivations (e.g., leftmost vs. rightmost), but for unambiguous grammars, each string has exactly one parse tree.