Search papers, labs, and topics across Lattice.
2
0
4
2
Chain-of-Thought explanations can be made significantly more faithful by training models to produce reasoning steps that allow a simulator to accurately predict outputs on counterfactual inputs.
LLMs encode truthfulness along a spectrum from general principles to narrow domain expertise, and exploiting domain-specific truth representations is key to steering model behavior.