Search papers, labs, and topics across Lattice.
2
0
5
1
Unleashing System 2 reasoning at System 1 speeds, SGA-MCTS lets frozen LLMs rival fine-tuned behemoths like GPT-4 on complex planning tasks.
RLHF can be significantly improved for complex tasks by explicitly modeling preference relationships both within and between training examples, unlocking better instruction following without relying on expensive human annotation or biased LLM-generated data.