Search papers, labs, and topics across Lattice.
7
0
11
9
LLM agents struggle significantly with personalized tool use, revealing critical gaps in their capabilities that existing benchmarks overlook.
MIRA achieves superior mid-training data selection by dynamically constructing source-specific evaluation rubrics, outperforming traditional methods while using half the data.
Forget dumb context stuffing: LongSeeker shows that strategically *editing* its own memory lets agents solve web search tasks with far greater reliability.
Forget resource-intensive pipelines: a purely academic team achieves SOTA search agent performance with just 10.6k SFT data points, outperforming models trained with CPT+SFT+RL.
EvoMaster achieves unprecedented performance in autonomous scientific discovery, outperforming traditional frameworks by up to 316%.
LLMs are still far from being autonomous scientists, failing to master even simplified, end-to-end physics research workflows.
Industrial code generation gets a reasoning boost: InCoder-32B-Thinking leverages error-driven feedback and a code world model to achieve top-tier performance on complex hardware-aware tasks.