Search papers, labs, and topics across Lattice.
7
0
13
31
Forget scraping – this work shows you can generate high-quality, executable terminal environments from scratch to train language agents that outperform models trained on scraped data.
MLLMs can't grasp metaphors in videos, revealing a surprising gap in their high-order cognitive abilities compared to humans.
LLMs trained with ScaleBox, a new high-fidelity code verification system, substantially outperform those trained with heuristic matching, suggesting current RLHF methods are bottlenecked by verification quality.
Multilingual RAG systems are systematically suppressing "answer-critical" documents in non-English languages, crippling their ability to leverage global knowledge.
Forget text-dominance: Today's Omni-modal LLMs surprisingly favor visual inputs, creating new challenges for cross-modal reasoning.
LLMs exhibit a "Utopian bias" when simulating human behavior, converging towards an unrealistic "positive average person" and failing to capture individual differences and long-tail behaviors.
LLMs trained with reinforcement learning from verifiable rewards (RLVR) become overconfident in incorrect answers, but a simple fix—decoupling reasoning and calibration objectives—can restore proper calibration without sacrificing accuracy.