Search papers, labs, and topics across Lattice.
8
0
12
Fine-tuning on the DeNovoSWE dataset boosts long-horizon software engineering performance by over 40 percentage points, revealing the potential of LLMs in complete repository generation.
LLM agents are shockingly vulnerable to multi-stage "trojan" attacks that inject malicious instructions into their workspace, achieving near-perfect success rates where standard prompt injection defenses fail.
Forget hand-crafting mobile benchmarks – PhoneWorld lets you automatically generate them from real-world GUI trajectories, leading to massive performance gains for phone-use agents.
An AI agent autonomously discovered four new superconductors, shrinking the discovery timeline from years to GPU hours.
Agent-World reveals that self-evolving environments can dramatically boost agent performance, outperforming established models by leveraging dynamic task synthesis.
Autonomous ML research agents achieve significantly better long-horizon performance by maintaining durable state through a shared workspace, suggesting that orchestration and memory are more critical than raw reasoning power.
LLMs can now navigate 100-turn multimodal search tasks without context explosion, thanks to a file-based visual representation that slashes token costs.
LLMs' training trajectories in RLVR are more predictable than you think: modeling the non-linear evolution of a rank-1 subspace lets you extrapolate parameters and cut compute by 37.5%.