Search papers, labs, and topics across Lattice.
Hunan University, University of Electronic Science and Technology
5
0
7
Layer-Selective Attention Caching achieves a 25% reduction in computation while enhancing audio quality retention by up to 6.7 times, revolutionizing efficiency in audio separation models.
Diffusion models can finally produce temporally stable video fusion by reframing the task as history-conditioned motion generation, sidestepping the limitations of optical flow and frame-by-frame processing.
Forget handcrafted losses: this paper uses human feedback and reinforcement learning to create infrared and visible image fusion that actually looks good to people.
Achieve state-of-the-art universal audio representation by unifying diverse audio tasks into a single next-token prediction framework, outperforming Whisper by a large margin.
Forget ImageNet: Xray-Visual sets a new SOTA for multimodal vision models by scaling to billions of social media data points with a novel three-stage training pipeline.