Search papers, labs, and topics across Lattice.
This paper conducts the first comprehensive benchmark of six generalized parsing algorithms鈥擟YK, Valiant, Earley, GLL, RNGLR, and BRNGLR鈥攁gainst deterministic LL(1) and LR(1) baselines across 22 diverse grammars. The findings reveal that the performance cost of using general context-free parsers is significantly lower than previously thought, with the GLR family achieving only a 3x median slowdown compared to LR(1) on deterministic grammars. This positions GLR as the optimal choice for enhancing software engineering tools without sacrificing performance.
General context-free parsers can achieve comparable performance to deterministic parsers, debunking the myth that expressiveness comes at a prohibitive cost.
Parsing underpins a vast range of software engineering tasks, from compilers and static analyzers to language servers and fuzz testing tools. Yet most parsers deployed in practice are deterministic (LL or LR), forcing developers not only to contort their grammars to fit the parser, but to simplify the very languages they design sacrificing expressiveness for the sake of parseability. General context-free parsers eliminate this constraint. Yet, despite decades of algorithmic development, no rigorous head-to-head comparison exists across the major families of parsing algorithms. We present the first unified, controlled benchmark of six generalized parsing algorithms: CYK, Valiant, Earley, GLL, RNGLR, and BRNGLR, plus deterministic LL(1) and LR(1) baselines, all implemented in Rust with shared data structures and parse-tree extraction, and evaluated across 22 grammars ranging from simple expressions to full C++ and Java. Our results show that the cost of generality is lower than widely assumed. On deterministic grammars, the GLR family incurs only a 3x median slowdown over LR(1), with a narrow and predictable variance. GLR is the clear performance winner among generalized parsers and a practical default choice for software engineering tools.