Jing Zhang

Achieve full-attention accuracy with 10x operator speedup and 4.7x throughput improvement in long-context LLM inference by overlapping KV cache transfers with computation.

Yuxuan Hu, Jianchao Tan, Jiaqi Zhang +7

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Apr 1, 2026

3w ago·also Shanghai AI Lab

All Roads Lead to Rome: Incentivizing Divergent Thinking in Vision-Language Models

RL's success in boosting VLM reasoning hides a critical flaw: it crushes the model's ability to explore diverse solutions, leading to premature convergence and hindering scalability.

Xinyu Tian, Zhaoyuan Yang, Mengqi He +2

Multimodal Models Reasoning & Chain-of-Thought RLHF & Preference Learning

Mar 18, 2026

Mar 18, 2026·also Hanoi University of Science and Technology, Institut Polytechnique de Paris, Northwestern

Action Draft and Verify: A Self-Verifying Framework for Vision-Language-Action Model

Ditch the diffusion vs. autoregressive debate: this VLA framework uses diffusion to *draft* actions and an autoregressive model to *verify* them, boosting real-world success by nearly 20%.

Zhuoran Wang, Shifeng Bao, Shifeng Bao +3

Computer Vision Multimodal Models Robotics & Embodied AI

Mar 18, 2026·also CAS, Northwestern

TimeAPN: Adaptive Amplitude-Phase Non-Stationarity Normalization for Time Series Forecasting

By explicitly modeling and predicting non-stationary factors in both time and frequency domains, TimeAPN significantly boosts the accuracy of long-term time series forecasting, outperforming existing normalization techniques.

Jialiang Tang, Siwei Yu, Baosheng Yu +2

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Mar 18, 2026·also Northwestern

Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation

Current LMMs can't reliably turn complex images into code, failing to preserve structural integrity even in relatively simple scenarios, as shown by the new Omni-I2C benchmark.

Xiang Feng, Qiming Zhang, Lihuo He +3

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Multimodal Models

Mar 9, 2026

Mar 9, 2026·also Northwestern

VSDiffusion: Taming Ill-Posed Shadow Generation via Visibility-Constrained Diffusion

By explicitly modeling visibility, VSDiffusion generates more geometrically plausible and realistic shadows, outperforming prior methods on a challenging image composition task.

Jing Zhang

Computer Vision Multimodal Models

Mar 9, 2026·also Northwestern

Detecting Fake Reviewer Groups in Dynamic Networks: An Adaptive Graph Learning Method

Spotting coordinated fake reviewers just got easier: a new graph learning method boosts detection accuracy by adaptively weighing network diversity and similarity.

Jing Zhang, Yao Zhang, Zhiwen Yu

Natural Language Processing Recommendation & Information Retrieval

Mar 4, 2026

Mar 4, 2026·also CAS, Northwestern

Universal Pansharpening Foundation Model

Forget satellite-specific hacks: FoundPS achieves state-of-the-art pansharpening performance with a single model robust to diverse sensors and scenes.

Hebaixu Wang, Jing Zhang, Haonan Guo +2

Computer Vision Multimodal Models

Mar 4, 2026·also Manuscript received xxx xx, Northwestern, WHU

Any2Any: Unified Arbitrary Modality Translation for Remote Sensing

Forget training separate models for every remote sensing modality pair: Any2Any learns a single latent space for unified translation, even generalizing to unseen modality combinations.

Haoyang Chen, Jing Zhang, Hebaixu Wang +4

Computer Vision Multimodal Models

Mar 3, 2026

Mar 3, 2026·also CAS, Northwestern, WHU

Seeing Clearly without Training: Mitigating Hallucinations in Multimodal LLMs for Remote Sensing

You can cut MLLM hallucinations in remote sensing tasks without any training by cleverly exploiting the model's own attention mechanisms to focus on relevant image regions.

Yi Liu, Jing Zhang, Xiaoyu Tian +1

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Feb 25, 2026

Tsinghua AIFeb 25, 2026·also CAS, HKU, LongCat Team, Northwestern +1

XStreamVGGT: Extremely Memory-Efficient Streaming Vision Geometry Grounded Transformer with KV Cache Compression

By pruning and quantizing the KV cache, XStreamVGGT achieves a remarkable 4.42x memory reduction and 5.48x speedup in streaming 3D reconstruction without sacrificing performance.

Zunhai Su, Weihao Ye, Hansen Feng +2

Architecture Design (Transformers, SSMs, MoE)Computer Vision Inference & Quantization

Feb 25, 2026·also Northwestern

Sparsity Induction for Accurate Post-Training Pruning of Large Language Models

Achieve superior LLM pruning performance by first nudging models toward sparsity-friendliness *before* applying any weight removal.

Minhao Jiang, Zhikai Li, Xuewen Liu +3

Inference & Quantization Natural Language Processing Training Efficiency & Optimization

Feb 24, 2026

Feb 24, 2026·also Northwestern

PIME: Prototype-based Interpretable MCTS-Enhanced Brain Network Analysis for Disorder Diagnosis

PIME leverages prototype-guided Monte Carlo Tree Search to extract compact, neuroscientifically-validated brain subnetworks predictive of disorder, outperforming standard deep learning approaches in both accuracy and interpretability.

Kunyu Zhang, Yanwu Yang, Jing Zhang +2

Interpretability & Mechanistic Interp Scientific Discovery & Drug Design World Models & Planning

Feb 23, 2026

Feb 23, 2026·also HKU, Northwestern

AdaWorldPolicy: World-Model-Driven Diffusion Policy with Online Adaptive Learning for Robotic Manipulation

Robots can now adapt to dynamic environments with minimal human involvement by learning from a world model and force-torque feedback, achieving state-of-the-art manipulation performance.

Ge Yuan, Qiyuan Qiao, Jing Zhang +1

Robotics & Embodied AI World Models & Planning

Feb 23, 2026·also Northwestern

Right to History: A Sovereignty Kernel for Verifiable AI Agent Execution

Individuals can now demand a tamper-proof, verifiable record of every action taken by AI agents operating on their own devices, thanks to a new sovereignty kernel.

Jing Zhang

Constitutional AI & AI Ethics Tool Use & Agents

Feb 20, 2026

Feb 20, 2026·also Northwestern, NYU

EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots

Forget global coordinates: EgoPush lets mobile robots rearrange multiple objects using only an egocentric camera and learned object relationships, even in cluttered environments.

Bo An, Boyuan An, Zhexiong Wang +8

Computer Vision Robotics & Embodied AI World Models & Planning

Feb 19, 2026

Feb 19, 2026·also Northwestern

Probability-Invariant Random Walk Learning on Gyral Folding-Based Cortical Similarity Networks for Alzheimer's and Lewy Body Dementia Diagnosis

By ditching node alignment, this random-walk method cracks the code for classifying highly variable brain networks, boosting accuracy in distinguishing Alzheimer's from Lewy Body Dementia.

Minheng Chen, Jing Zhang, Li Su +1

Computer Vision Scientific Discovery & Drug Design

Feb 17, 2026

Tsinghua AIFeb 17, 2026·also DAMO, CAS, Northwestern, ShanghaiTech

SecCodeBench-V2 Technical Report

LLM code copilots are put to the test with SecCodeBench-V2, a new benchmark revealing their security vulnerabilities across 22 CWE categories and five programming languages.

Longfei Chen, Ji Zhao, Lanxiao Cui +17

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Feb 15, 2026

Feb 15, 2026·also Tsinghua AI, BUPT, CAS, Corresponding author +1

GeoEyes: On-Demand Visual Focusing for Evidence-Grounded Understanding of Ultra-High-Resolution Remote Sensing Imagery

MLLMs struggle to effectively zoom into relevant details in ultra-high-resolution remote sensing imagery, but a new staged training framework can teach them when and where to focus for substantial accuracy gains.

Fengxiang Wang, Mingshuo Chen, Yueying Li +4

Computer Vision Multimodal Models Tool Use & Agents