Haoyuan Xu

Hunan University, University of Electronic Science and Technology

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Computer Vision (3)Multimodal Models (3)Speech & Audio (2)Interpretability & Mechanistic Interp (1)

Frequent co-authors

Xingyuan Li (2)Yuxuan Chen (1)Peize He (1)Shulin Li (1)

Papers (5)

Jun 8, 2026

1w ago·also Hunan, University of Electronic Science and Technology

Inside the Latent Flow: Causal Deciphering of Attention Dynamics in Audio Separation Foundation Models

Layer-Selective Attention Caching achieves a 25% reduction in computation while enhancing audio quality retention by up to 6.7 times, revolutionizing efficiency in audio separation models.

Yuxuan Chen, Haoyuan Xu, Peize He

Interpretability & Mechanistic Interp Speech & Audio

May 25, 2026

Xingyuan Li +53w ago·also Hunan, NJU, University of Electronic Science and Technology

DRFusion: Drift-Resilient Temporally Consistent Infrared-Visible Video Fusion

Diffusion models can finally produce temporally stable video fusion by reframing the task as history-conditioned motion generation, sidestepping the limitations of optical flow and frame-by-frame processing.

Xingyuan Li, Haoyuan Xu, Shulin Li +3

Computer Vision Multimodal Models

Mar 4, 2026

Xingyuan Li +4Mar 4, 2026·also Hunan, University of Electronic Science and Technology

Bridging Human Evaluation to Infrared and Visible Image Fusion

Forget handcrafted losses: this paper uses human feedback and reinforcement learning to create infrared and visible image fusion that actually looks good to people.

Xingyuan Li, Qingyun Mei, Haoyuan Xu +2

Computer Vision Multimodal Models

Feb 25, 2026

Feb 25, 2026·also University of Electronic Science and Technology

UniWhisper: Efficient Continual Multi-task Training for Robust Universal Audio Representation

Achieve state-of-the-art universal audio representation by unifying diverse audio tasks into a single next-token prediction framework, outperforming Whisper by a large margin.

Haoyuan Xu, Junzi Zhang

Architecture Design (Transformers, SSMs, MoE)Speech & Audio Training Efficiency & Optimization

Feb 18, 2026

FacebookFeb 18, 2026·also Meta AI, Hunan, Instagram, University of Electronic Science and Technology

Xray-Visual Models: Scaling Vision models on Industry Scale Data

Forget ImageNet: Xray-Visual sets a new SOTA for multimodal vision models by scaling to billions of social media data points with a novel three-stage training pipeline.

Shlok Mishra, Tsung-Yu Lin, Linda Wang +18

Computer Vision Data Curation & Synthetic Data Multimodal Models

Search

Haoyuan Xu

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (5)