Search papers, labs, and topics across Lattice.
This paper addresses the challenge of hallucination in Large Language Models (LLMs) when generating precision-critical outputs in fields like technical diagramming and mechanical design. By introducing PyGeoX, a programmable geometric domain-specific language (DSL), the authors compile geometric constraints into a differentiable loss, enabling effective learning from solver residuals. The key finding reveals that their proposed Saturating Additive Rewards (SAR) significantly enhances performance, achieving a 2.3x improvement in solving rates compared to traditional MSE-based rewards, while maintaining competitive performance with larger models.
A single outlier constraint can derail the learning process in LLMs, but a new reward structure can turn that weakness into a strength, boosting solving rates dramatically.
Large Language Models frequently hallucinate in precision-critical domains such as technical diagramming and mechanical design, where outputs must satisfy strict geometric constraints. We study open-ended geometric synthesis from natural language: translating free-form descriptions into precise constructions whose entities must simultaneously satisfy dozens of interacting constraints. To make this tractable, we release PyGeoX, a programmable geometric DSL that compiles declarative constraints into a differentiable loss, and PyGeoX-Bench, a stratified suite of 300 problems with per-constraint verifiable rewards. Using PyGeoX as a verifier, we identify a failure mode we call Outlier Gradient Masking: under global-norm rewards (any scheme that aggregates residuals through a single norm, for example, $\exp(-\mathrm{MSE})$), a single outlier constraint can nullify the learning signal across all others. To address this, we propose Saturating Additive Rewards (SAR), which decompose the reward into bounded per-constraint terms, preserving partial progress and ensuring consistent gradients even under severe violations. Against MSE-based rewards, the natural baseline for geometry solvers, SAR improves the hard-tier solving rate by $2.3\times$, and the resulting 8B model is competitive with much larger frontier systems on this benchmark. We release the engine, benchmark, and data at https://github.com/Huawei-AI4Math/PyGeoX.