Xu Zheng

Routing SQL queries based on complexity allows DecoSearch to achieve unprecedented execution accuracy while using an order of magnitude fewer tokens than traditional methods.

Esteban Schafir, Xu Zheng, Hojat Allah Salehi +3

Code Generation & Program Synthesis Reasoning & Chain-of-Thought

Apr 30, 2026

Feiyu Wu +6Apr 30, 2026·also HKUST, Xidian

RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses

LLM-generated rewards in RL can be misleading early in training, but RHyVE dynamically selects the best reward signal based on policy competence, leading to improved performance.

Feiyu Wu, Xuhui Zheng, Xu Zheng +4

RLHF & Preference Learning Scalable Oversight & Alignment Theory

Mar 16, 2026

Panoramic Affordance Prediction

Existing affordance prediction models fall flat when confronted with the wide-angle, distorted reality of panoramic vision, but a new training-free pipeline called PAP rises to the challenge.

Zixin Zhang, Chenfei Liao, Hongfei Zhang +10

Computer Vision Robotics & Embodied AI World Models & Planning

Mar 16, 2026·also HKUST, Melbourne

Seeing Beyond: Extrapolative Domain Adaptive Panoramic Segmentation

Achieve state-of-the-art panoramic segmentation by training on local perspective views and generalizing to full 360° images, even with geometric distortions and unseen classes.

Yuanfan Zheng, Yu Zheng, Kunyu Peng +2

Computer Vision Robotics & Embodied AI

Mar 12, 2026

Hongfei Zhang +14Mar 12, 2026·also HKUST

DVD: Deterministic Video Depth Estimation with Generative Priors

Pre-trained video diffusion models can be deterministically adapted into state-of-the-art zero-shot depth estimators, sidestepping the need for massive labeled datasets.

Hongfei Zhang, Harold Haodong Chen, Chenfei Liao +12

Computer Vision Multimodal Models World Models & Planning

Ye Pan +8Mar 12, 2026·also HKUST

EgoIntent: An Egocentric Step-level Benchmark for Understanding What, Why, and Next

Current MLLMs are surprisingly bad at understanding human intent in egocentric videos at a step-by-step level, achieving only 33% accuracy on a new benchmark designed to prevent future-frame leakage.

Ye Pan, Chifai Wong, Chi Kit Wong +6

Eval Frameworks & Benchmarks Multimodal Models Robotics & Embodied AI

Dingcheng Zhen +6Mar 12, 2026·also HKUST, Soul AI Lab

SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory

Forget short looping animations – this new diffusion model generates hour-long, real-time human animations with lip-sync accuracy and emotional expressiveness, all while running on just two GPUs.

Dingcheng Zhen, Xu Zheng, Ruixin Zhang +4

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Feb 23, 2026

Feb 23, 2026·also Chemical and Biomolecular Engineering

Unlocking Multimodal Document Intelligence: From Current Triumphs to Future Frontiers of Visual Document Retrieval

The first comprehensive survey of Visual Document Retrieval reveals how MLLMs are reshaping the field, highlighting the shift towards RAG and agentic systems for complex document understanding.

Yibo Yan, Jiahao Huo, Guanbo Feng +14

Computer Vision Multimodal Models Recommendation & Information Retrieval