Search papers, labs, and topics across Lattice.
The paper investigates the ability of reasoning language models to access and utilize world knowledge stored within their parameters, finding that they are not optimized for this task by default. They demonstrate that a simple "think step-by-step" prompt improves knowledge recall, and further propose a reinforcement learning approach using world-knowledge question answering as a reward signal to explicitly train models to reason over their parametric knowledge. The RL training on TriviaQA leads to significant improvements in knowledge recall across several QA datasets, indicating that reasoning models can be effectively trained to improve parametric knowledge access.
A simple "think step-by-step" prompt unlocks surprisingly better world knowledge recall in reasoning LMs, suggesting they're under-optimized for accessing their own parametric knowledge.
We study reasoning for accessing world knowledge stored in a language model's parameters. For example, recalling that Canberra is Australia's capital may benefit from thinking through major cities and the concept of purpose-built capitals. While reasoning language models are trained via reinforcement learning to produce reasoning traces on tasks such as mathematics, they may not reason well for accessing their own world knowledge. We first find that models do not generate their best world knowledge reasoning by default: adding a simple"think step-by-step"cue demonstrates statistically significant improvement in knowledge recall but not math. Motivated by this, we propose training models to reason over their parametric knowledge using world-knowledge question answering as a verifiable reward. After reinforcement learning on TriviaQA (+9.9%), performance also improves on Natural Questions, HotpotQA, SimpleQA, and StrategyQA by 4.2%, 2.1%, 0.6%, and 3.0%, respectively. Reasoning models are under-optimized for parametric knowledge access, but can be easily trained to reason better.