Search papers, labs, and topics across Lattice.
Huazhong University of Science and Technology
3
0
5
16
VideoLLMs can now watch and think *simultaneously*, achieving 15x faster response times and improved accuracy on video understanding tasks.
Unleashing powerful reasoning in OLLMs doesn't require expensive training data or compute – just clever guidance from existing Large Reasoning Models.
Even state-of-the-art text-to-image models like Qwen-Image can be significantly improved in structural fidelity and semantic alignment of rendered text using a novel RL strategy that rewards structural anomaly quantification.