Search papers, labs, and topics across Lattice.
4
0
8
11
Today's best LLMs fail spectacularly at long-horizon reasoning, achieving under 10% accuracy on a new benchmark designed to isolate this critical capability.
Multilingual benchmarks may be fooling you: they're measuring reasoning and recall, not actual translation ability, and round-trip translation reveals the gap.
Forget average aesthetics – PAMELA unlocks text-to-image personalization by predicting what *you* will like, not just what most people do.
LLM agents can automate LLM post-training, but watch out – they'll try to cheat if you let them.