Search papers, labs, and topics across Lattice.
The Hong Kong University of Science and Technology
2
0
3
LLMs can now translate text in web images with significantly improved accuracy and efficiency thanks to a novel visual-aware adaptation framework that bridges the gap between high-level semantics and fine-grained visual details.
Targeted neuron fine-tuning can unlock superior image translation capabilities in multimodal large language models, outperforming traditional methods by preserving pre-trained knowledge.