Search papers, labs, and topics across Lattice.
IBM Research ,University of Southern California, Information Sciences Institute
1
0
3
6
Forget fixed decoding strategies – RL can learn a lightweight policy to adapt LLM sampling *at test time*, boosting summarization quality by up to 88% without retraining the LLM.