Search papers, labs, and topics across Lattice.
The Pennsylvania State University State College
1
0
3
LLMs can learn to abstain from answering questions they're unsure about with state-of-the-art accuracy by dynamically re-weighting abstention rewards based on trajectory consistency during training.