Search papers, labs, and topics across Lattice.
University of Science and Technology of China & iFLYTEK Co., Ltd.
3
0
5
2
Mags-RL lets multimodal LLMs see the forest *and* the trees, using reinforcement learning to guide a super-resolution agent that selectively enhances image regions for improved reasoning without extra annotations.
Training on 500K automatically-curated ophthalmology instructions lets a vision-language model leapfrog general medical models in a specialized domain.
Chemical reaction diagram parsing, a notoriously difficult task for vision-language models, sees a significant leap in performance thanks to a new multi-agent framework that enforces chemical consistency.