Search papers, labs, and topics across Lattice.
4
0
5
Continual learning for LLM agents hits a wall: scaling models doesn't reliably improve skill generation, and self-feedback can lead to recursive drift.
A clever two-stage agent using smaller models can produce better, more substantive peer reviews than brute-force application of the largest LLMs.
Today's best AI agents can only complete 33% of common online tasks like booking appointments or filling out job applications, revealing a significant gap between current capabilities and real-world utility.
A Qwen3-8B model, trained with a new SFT+RLAIF recipe on a challenging new benchmark, SWE-QA-Pro, beats GPT-4o in repository-level code understanding.