Search papers, labs, and topics across Lattice.
2
0
6
0
LLMs can learn to recognize when they lack sufficient information for reasoning and proactively ask for clarification, leading to more reliable and concise answers.
SFT's instability and reward sparsity can be overcome with a novel Group Fine-Tuning (GFT) framework, leading to better LLM policies.