Search papers, labs, and topics across Lattice.
5
0
9
Current vision-language models are surprisingly bad at interpreting scientific figures, failing to match expert-level reasoning on a new benchmark of experimental images.
LLMs can now generate research roadmaps that are not only better but also far faster than those created by human experts, thanks to a novel multi-agent system.
Retrieval improvements in RAG don't always boost reasoning, but NeocorRAG's evidence chains can fix that, achieving SOTA with 80% fewer tokens.
GPT-5.1 can barely crack 50% accuracy when distinguishing real from AI-generated academic images, highlighting a stark gap between generative capabilities and forensic detection.
Reward models optimized for single-step generation can fail spectacularly when integrated into multi-stage LLM pipelines, but pipeline-aware training can fix this.