Search papers, labs, and topics across Lattice.
2
0
5
1
LLMs can be systematically debugged and improved by treating training data as code, allowing for targeted "patches" that fix concept-level gaps and reasoning errors.
SFT's instability and reward sparsity can be overcome with a novel Group Fine-Tuning (GFT) framework, leading to better LLM policies.