Search papers, labs, and topics across Lattice.
3
0
5
LLMs can now translate text in web images with significantly improved accuracy and efficiency thanks to a novel visual-aware adaptation framework that bridges the gap between high-level semantics and fine-grained visual details.
LLMs can learn multilingual translation far more effectively by explicitly separating and routing language modeling and translation knowledge during fine-tuning.
Targeted neuron fine-tuning can unlock superior image translation capabilities in multimodal large language models, outperforming traditional methods by preserving pre-trained knowledge.