Search papers, labs, and topics across Lattice.
2
0
3
6
Today's best AI agents fail at realistic software engineering tasks, stalling before even reaching 30% completion, revealing the urgent need for better long-horizon planning and human-AI collaboration.
Pointwise reward models can finally compete with pairwise models in RLHF, thanks to a new intergroup comparison method that scales linearly with the number of candidates.