Search papers, labs, and topics across Lattice.
4
0
6
Deterministic decoding can outperform stochastic self-consistency in constrained domains by systematically exploring high-probability reasoning traces, leading to better performance with less computation.
Training LLMs to simulate student misconceptions can backfire, degrading overall reasoning accuracy unless you provide detailed, step-by-step feedback during training.
LLMs surprisingly mimic human strategies for generating plausible student misconceptions, but their success hinges on first solving the problem correctly.
Multilingual LLMs can be made significantly more reliable by directly optimizing for crosslingual consistency using a DPO-inspired method that requires no explicit reward model.