Search papers, labs, and topics across Lattice.
The authors investigate character description generation from long-form narratives and find that standard reasoning-enabled LLMs perform worse than those with disabled reasoning. To address this, they propose a training framework that decouples reasoning (via a QA model) from generation, using the QA trace to guide the generation model. Experiments on BookWorm and CroSS datasets demonstrate that this QA-guided reasoning approach improves faithfulness, informativeness, and grounding compared to strong long-context baselines.
Disabling LLMs' built-in reasoning can paradoxically improve character description generation, but the right kind of external reasoning (QA-guided) boosts performance even further.
Character description generation is an important capability for narrative-focused applications such as summarization, story analysis, and character-driven simulations. However, generating accurate character descriptions from long-form narratives (e.g., novels) is challenging: models must track evolving attributes (e.g., relationships and events), integrate evidence scattered across the text, and infer implicit details. Despite the success of reasoning-enabled LLMs on many benchmarks, we find that for character description generation their performance improves when built-in reasoning is disabled (i.e., an empty reasoning trace). Motivated by this, we propose a training framework that decouples reasoning from generation. Our approach, which can be applied on top of long-context LLMs or chunk-based methods, consists of a reasoning model that produces a structured QA reasoning trace and a generation model that conditions on this trace to produce the final character description. Experiments on two datasets (BookWorm and CroSS) show that QA-guided reasoning improves faithfulness, informativeness, and grounding over strong long-context baselines.