Search papers, labs, and topics across Lattice.
5
0
9
Rebuttals hold the key to actionable AI-generated peer reviews: RbtAct uses them to train LLMs to give feedback that authors actually use.
Current multimodal math models struggle with visual interpretation, symbol alignment, and consistent reasoning, highlighting the need for a unified "Perception-Alignment-Reasoning" framework.
LLM judges inflate math proof scores by up to 0.36 points, revealing a significant alignment gap with human experts and a reasoning breakdown in discrete domains.
Reference-guided LLM evaluators can boost alignment in non-verifiable domains, enabling self-improvement to rival reward model training.
Even GPT-5 struggles to reliably reproduce novel research findings, highlighting a significant gap between capability and reliability for AI agents tackling end-to-end research tasks.