Search papers, labs, and topics across Lattice.
5
0
9
25
Standard retriever evaluations hide critical weaknesses in agentic search systems, but a new benchmark and training method exposes and addresses these flaws.
LLMs are rapidly transforming peer review, but critical gaps remain in ensuring quality, fairness, and ethical considerations across the entire workflow.
Frontier models are wasted on routine GUI tasks: a step-level cascade that adaptively invokes stronger models only when lightweight monitors detect progress stalls or semantic drift slashes compute costs without sacrificing performance.
Stop generating superficial reviews: RbtAct leverages rebuttals to train LLMs to provide actionable feedback, leading to concrete revisions and improved author uptake.
Even GPT-5 struggles to reliably reproduce novel research findings, highlighting a significant gap between capability and reliability for AI agents tackling end-to-end research tasks.