Search papers, labs, and topics across Lattice.
2
0
5
2
LLMs can be made to reason much better by directly optimizing their pre-training output distribution, even before fine-tuning on specific tasks.
Hallucinations in RL-based image editing and generation are tamed with FIRM, a new framework that trains robust reward models on curated datasets to provide more accurate guidance.