Search papers, labs, and topics across Lattice.
KAIST
3
0
5
A simple resampling strategy closes the "Thinking-Acting Gap" in agentic VLMs, enabling smaller models to outperform larger ones on multimodal reasoning tasks.
Visual token pruning, effective for simple tasks, breaks down in complex visual reasoning due to shifting information needs during decoding, but a simple add-on fixes it.
LLMs can iteratively refine their reasoning and reduce errors by recursively evaluating and improving their own confidence, leading to more stable and faster inference.