Search papers, labs, and topics across Lattice.
Nankai University
2
0
5
TARPO outperforms traditional reasoning methods by seamlessly integrating discrete and continuous approaches, revolutionizing policy exploration in LLMs.
LLM post-training isn't just about objectives; it's about strategically intervening on model behavior through support expansion, policy reshaping, and behavioral consolidation.