Module V·Article II·~5 min read
Context-Free Languages and Grammars
Automata and Formal Languages
Turn this article into a podcast
Pick voices, format, length — AI generates the audio
Context-Free Grammars
CFG and CFL
A context-free grammar G = (V, Σ, R, S): V — nonterminals, Σ — terminals, R — rules of the form A → α (A∈V, α∈(V∪Σ)*), S — start nonterminal.
Language L(G) = {w ∈ Σ*: S ⟹* w}.
Example: Language {aⁿbⁿ: n≥0}: S → ε | aSb.
Chomsky normal form: A → BC or A → a. Every CFG is equivalent to a grammar in CNF.
Pushdown Automata
PDA = (Q, Σ, Γ, δ, q₀, Z₀, F): a stack (pushdown store) with alphabet Γ is added.
Transitions: δ(q, a, Z) = {(p, γ)}: read a, remove Z, push γ.
Theorem: CFL = languages recognized by PDA.
CFLs are closed under: union, concatenation, Kleene star. Not closed under: intersection, complement (in general).
Chomsky Hierarchy
Type 0 (no restrictions) → recursively enumerable languages, Turing machines. Type 1 (context-sensitive) → LBA (linear-bounded automata). Type 2 (context-free) → PDA. Type 3 (regular) → DFA.
Each type is strictly contained in the previous one.
Algorithmic Problems
CFL: emptiness, finiteness, membership (CYK algorithm O(n³|G|)) — decidable. Equivalence of CFGs — undecidable. Intersection of two CFLs is not always a CFL.
Normal Forms of CFG and CYK Algorithm
Chomsky normal form (CNF): Each rule has the form A → BC or A → a. Any CFG can be converted to CNF in polynomial time. CNF forms the basis for the CYK (Cocke–Younger–Kasami) algorithm: parsing in O(n³|G|) using dynamic programming. Table dp[i][j][A] = 1 if nonterminal A generates the substring s[i..j].
Greibach normal form: Each rule A → aα, where a is a terminal, α is a string of nonterminals. Convenient for recursive descent (LL parsers). In programming, normal forms of grammars are fundamental for compilers (Yacc, Bison use LALR(1) grammars).
Chomsky Hierarchy
Regular languages (type 3) ⊂ Context-free (type 2) ⊂ Context-sensitive (type 1) ⊂ Recursively enumerable (type 0). The inclusions are strict:
- Type 3: DFA / regular expressions.
- Type 2: PD automata / CFG (grammars of programming languages).
- Type 1: Linear-bounded TM (LBA).
- Type 0: Arbitrary TM.
Type 1 languages are always decidable (LBA solves in exponential time), type 0 — recursively enumerable (not necessarily decidable).
Deterministic CFL and Parsing
DCFL (deterministic CFL) — a subclass of CFL recognized by deterministic PDA. Strictly smaller than CFL: {aⁿbⁿcⁿ: n≥0} — not a CFL (proved by the pumping lemma for CFL), so this is not a CFL example, but an example outside CFL. Example of DCFL: {aⁿbⁿ: n≥0}. Example of CFL, but not DCFL: {ww^R: w ∈ {a,b}*} (even-length palindromes) — DPDA cannot handle due to non-determinism in the center.
Programming languages: LL(k) and LR(k) grammars are deterministic subclasses for which efficient (linear-in-input-length) parsers exist. LR(1) grammars cover most practical languages.
Pumping Lemma for CFL
Pumping lemma (context-free): For any CFL L there exists p such that any string w ∈ L of length ≥ p can be decomposed w = uvxyz: |vy| ≥ 1, |vxy| ≤ p, uvⁿxyⁿz ∈ L for all n ≥ 0. Proof via parse tree: long string → tree of height > |G| → repeated nonterminal → "pump" between two occurrences. Example of application: {aⁿbⁿcⁿ} is not a CFL.
Attribute Grammars
An attribute grammar extends CFG with attributes for nonterminals. Synthesized attributes: calculated from children (bottom-up). Inherited attributes: passed from parent or siblings (top-down). Used for semantic analysis in compilers: type computations, checking variable declarations. Notation: S-attributed grammars (only synthesized) — particularly convenient for LR parsers.
Transducers and Translating Automata
Finite transducer: like a DFA, but on a transition outputs an output symbol (Moore machine or Mealy machine). Mealy: output on transition; Moore: output in state. Application: lexical transformations, transliteration, speech synthesis. Automata morphisms: for regular languages — closure under homomorphisms, inverse homomorphisms, and intersection with regulars → another method to prove nonregularity.
Comparative Complexity of Grammar Classes
CFLs are closed under: union, concatenation, Kleene star, homomorphism, inverse homomorphism, intersection with regular. Not closed under: intersection, complement. CFL ∩ CFL can be non-CFL: classic example — {aⁿbⁿcᵐ: n,m≥0} ∩ {aᵐbⁿcⁿ: n,m≥0} = {aⁿbⁿcⁿ: n≥0}, but {aⁿbⁿcⁿ} is not a CFL by the pumping lemma. Context-sensitive: closed under intersection and complement, but undecidable for emptiness. Recursively enumerable: only intersection and union.
CYK Algorithm and Probabilistic CFG
CYK (Cocke–Younger–Kasami): check whether a string w belongs to CFL in O(n³|G|). Requires Chomsky normal form (CNF): A → BC or A → a. PCFG (probabilistic CFG): each rule A → α has probability P(α|A). The sum of probabilities for all rules for A is 1. Most probable parse — Viterbi algorithm on PCFG. Parameter training: Inside–Outside algorithm (analogous to Baum–Welch for HMM). Used in statistical syntactic text analysis.
Numerical Example: Derivation in CFG and CYK Algorithm
Task: CFG G: S→AB, A→a, B→b. String w="ab". Does w belong to L(G)?
Step 1: CYK table (Chomsky normal form is satisfied). Steps of length 1: w₁=a: dp[1,1]={A}; w₂=b: dp[2,2]={B}.
Step 2: Step of length 2: dp[1,2]. Check: is there a rule X→YZ, where Y∈dp[1,1]={A} and Z∈dp[2,2]={B}? S→AB: A∈{A} ✓, B∈{B} ✓. Therefore, dp[1,2]={S}.
Step 3: S∈dp[1,2] → w="ab"∈L(G). Parse tree: S⟹AB⟹aB⟹ab. Check w="ba": dp[1,1]={B}, dp[2,2]={A}. dp[1,2]: S→AB — A∈{B}? No. dp[1,2]={}. S∉dp[1,2] → "ba"∉L(G).
Step 4: CYK complexity: O(n³·|G|). For a sentence with n=20 words and |G|=50 rules: 20³·50=400,000 operations. PCFG adds weights: P(S→AB)=0.7, P(A→a)=1.0, P(B→b)=1.0. Parse probability for "ab": 0.7·1·1=0.70. Viterbi algorithm on PCFG selects the most probable syntactic tree for the sentence.
§ Act · what next