Search papers, labs, and topics across Lattice.
Tencent Youtu Lab
3
0
6
Forget bolting vision onto language models – truly powerful multimodal AI demands rethinking architectures from the ground up.
Medical-specific vision-language models surprisingly underutilize visual information in Japanese medical licensing exams, often performing well even when images are removed, highlighting a critical gap in their multimodal reasoning capabilities.
ErrorLLM tackles the challenge of refining LLM-generated SQL by explicitly modeling and detecting implicit semantic errors, leading to substantial improvements in text-to-SQL performance.