Search papers, labs, and topics across Lattice.
3
0
4
0
LLMs can reason better when they're not forced to answer in English, and a new RL method leverages this quirk to boost performance across reasoning tasks.
Human-like evaluation of long-form generative AI is now possible, thanks to a new framework that breaks down reference answers into weighted, context-aware scoring points.
LLMs struggle to understand nuanced values across languages, with accuracy dropping below 77% and varying by over 20% between languages, as revealed by the new X-Value benchmark.