Search papers, labs, and topics across Lattice.
4
1
5
2
The hardest AI tasks remain largely unsolved, with current models achieving only a 2.6% success rate on economically valuable workflows.
No single AI model dominates across all professional industries, revealing distinct occupational capability profiles and highlighting the need for specialized AI development.
Training web agents in a simulator can now match real-world performance: Qwen3-14B, fine-tuned with WebWorld-synthesized trajectories, rivals GPT-4o on WebArena.
ToolRMs drastically improve tool-use accuracy in LLMs, outperforming existing models by up to 17.94%, while also reducing output token usage by over 66% through efficient inference-time scaling.