Search papers, labs, and topics across Lattice.
UT Austin
2
0
3
Prefix failure in on-policy distillation can be effectively mitigated by correcting problematic prefixes, leading to significant improvements in reasoning coverage and accuracy.
Forget policy gradients: Value Gradient Flow (VGF) offers a simpler, more scalable way to align LLMs by directly optimizing value functions via optimal transport.