Jan 12, 2026arXiv:2601.07525

Thinking Before Constraining: A Unified Decoding Framework for Large Language Models

Ngoc-Hieu Nguyen, Alonso Silva, Laith Zumot, L. Tupikina, A. Aghasaryan, Mehwish Alam

AI Summary

The paper introduces "In-Writing," a novel decoding framework for LLMs that decouples free-form reasoning from structured generation by using a trigger token to signal the switch to constrained decoding. This approach mitigates the issue of premature constraint application, which can hinder reasoning capabilities. Experiments across classification and reasoning tasks show that In-Writing achieves accuracy gains of up to 27% compared to natural generation and other constrained decoding methods.

Key Contribution

LLMs can reason better when you let them think freely *before* forcing them to format.

Abstract

Natural generation allows Large Language Models (LLMs) to produce free-form responses with rich reasoning, yet the lack of structure makes outputs difficult to verify. Conversely, constrained decoding ensures standardized formats but can inadvertently restrict reasoning capabilities by imposing constraints too early in the generation process. We propose a hybrid approach, namely In-Writing, that combines free-form reasoning and structured generation in a single call. The model first performs unconstrained reasoning and only applies structured decoding after a trigger token is generated, explicitly decoupling reasoning from formatting. We establish that our trigger-token strategies are able to virtually eradicate premature triggering, a failure mode in which constrained decoding interrupts on-going reasoning. Evaluations across diverse datasets covering classification and reasoning tasks demonstrate that our approach outperforms the state-of-the-art by achieving accuracy gains of up to 27% over natural generation. Our code are available at: https://github.com/Nokia-Bell-Labs/InWriting.

Code Generation & Program Synthesis Natural Language Processing Reasoning & Chain-of-Thought

Citation Metrics

Citations3

Influential citations0

References37

Year2026

VenuearXiv.org

Related Papers

Finding related papers...

Search

Thinking Before Constraining: A Unified Decoding Framework for Large Language Models

Related Papers