Search papers, labs, and topics across Lattice.
2 papers published across 0 labs.
Stop wasting compute on irrelevant actions: targeted hindsight self-distillation focuses LLM agent training on the critical failure points, boosting performance and slashing training time.
Forget expensive human annotations: this unsupervised method trains reward models that steer LLM reasoning just as well as, or even better than, their supervised counterparts.