Y. Zhang

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Tool Use & Agents (10)Recommendation & Information Retrieval (7)Training Efficiency & Optimization (7)Eval Frameworks & Benchmarks (7)

Frequent co-authors

Zhengyang Tang (2)Shangpin Peng (2)Weinong Wang (2)Yiduo Guo (2)

Papers (25)

Jul 16, 2026

1w ago

HyMobileAgent: Data-Environment Co-Scaling for Efficient GUI Agents

Mobile agents can now navigate complex GUIs with unprecedented efficiency, thanks to a novel data-environment co-scaling framework.

Hy Vision Team, Huawen Shen, Zhengyang Tang +22

Multimodal Models Robotics & Embodied AI Tool Use & Agents

May 28, 2026

VLA-Trace: Diagnosing Vision-Language-Action Models through Representation and Behavior Tracing

VLA models may excel at visually grounded tasks, but VLA-Trace reveals they still struggle with fine-grained semantic understanding and exhibit distinct modality processing strategies.

Haoyuan Shi, Xiancong Ren, Yingji Zhang +9

Interpretability & Mechanistic Interp Multimodal Models Robotics & Embodied AI

Haoran Ding +18May 28, 2026

Rec-Distill: An Industrial Distillation Pipeline for Large-Scale Recommendation Models

Bridge the gap between offline model scaling and online deployment in recommendation systems: Rec-Distill enables lightweight student models to capture a substantial portion of the performance gains from massive teacher models.

Haoran Ding, Wenlin Zhao, Yuchen Jiang +16

Inference & Quantization Recommendation & Information Retrieval Training Efficiency & Optimization

Hao Wu +3May 28, 2026·also B) (Base: Wan2.2-I

IP-Adapter Is All You Need: Towards Fine-Tuning-Free Diffusion-Based Talking Face Generation

Achieve state-of-the-art talking face generation without any fine-tuning, proving that pre-trained diffusion models like Stable Diffusion already possess strong lip-related semantics.

Hao Wu, Xiangyang Luo, Y. Zhang +1

Computer Vision Multimodal Models Speech & Audio

May 28, 2026

PhoneWorld: Scaling Phone-Use Agent Environments

Forget hand-crafting mobile benchmarks – PhoneWorld lets you automatically generate them from real-world GUI trajectories, leading to massive performance gains for phone-use agents.

Zhengyang Tang, Yuxuan Liu, X. Lai +22

Eval Frameworks & Benchmarks Tool Use & Agents World Models & Planning

Y. Zhang +3May 28, 2026

AnomalyAgent: Training-Free Agentic Models for Zero-/Few-Shot Anomaly Detection

Forget training wheels: AnomalyAgent uses the reasoning power of multimodal LLMs to spot anomalies in zero- or few-shot settings, outperforming traditional VLM approaches.

Y. Zhang, Jiawen Zhu, Lele Fu +1

Computer Vision Multimodal Models Tool Use & Agents

May 28, 2026·also BJTU

GUITestScape: Towards Open-set Evaluation on Exploratory GUI Testing

Current MLLM agents struggle to find GUI defects, but a new benchmark and evaluator reveals the critical bottleneck is detection, and surprisingly, simply integrating the evaluator's verifiers significantly boosts performance without retraining.

Xiaoyi Chen, Yifei Gao, Yang Xu +3

Eval Frameworks & Benchmarks Tool Use & Agents

May 26, 2026

NUSMay 26, 2026·also Tsinghua AI, Meituan, TJU, University of Science and Technology +1

Learning to Act under Noise: Enhancing Agent Robustness via Noisy Environments

LLM agents trained with simulated user and tool noise not only become more robust in messy real-world environments, but also surprisingly improve on clean, idealized benchmarks.

Xiaodong Cai, Junfeng Fang, Zhuowen Han +4

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Tool Use & Agents

NUSMay 26, 2026·also BUPT, Meituan, University of Science and Technology, USTC +1

VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

Current LLM agents still struggle to infer and leverage user preferences from fragmented, real-world interactions, revealing a substantial gap between their capabilities and the demands of personalized decision-making.

Y. Zhang, Zhengzhou Cai, Yaorui Shi +5

Eval Frameworks & Benchmarks Recommendation & Information Retrieval Tool Use & Agents

Apr 29, 2026

Apr 29, 2026·also UQ

ProMax: Exploring the Potential of LLM-derived Profiles with Distribution Shaping for Recommender Systems

LLM-derived user profiles can be powerfully leveraged for recommendation via a surprisingly simple distribution shaping approach, outperforming more complex fusion methods.

Y. Zhang, Tong Chen

Natural Language Processing Recommendation & Information Retrieval

D sequence? Across the smallApr 29, 2026·also BAIR, Mila, Radboud, The Netherlands Cancer Institute +2

When 2D Tasks Meet 1D Serialization: On Serialization Friction in Structured Tasks

LLMs struggle with structured 2D tasks when inputs are serialized into 1D, revealing a surprising performance gap compared to vision-augmented models that directly process the 2D layout.

Chung-Hsiang Lo, Lu Li, Diji Yang +4

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Reasoning & Chain-of-Thought

Apr 28, 2026

Dewei Bai +5Apr 28, 2026

Vision SmolMamba: Spike-Guided Token Pruning for Energy-Efficient Spiking State-Space Vision Models

By intelligently pruning tokens based on spike timing and activation, Vision SmolMamba achieves state-of-the-art efficiency in spiking neural networks, outperforming even Spiking Mamba.

Dewei Bai, Hongxiang Peng, Yunyun Zeng +3

Architecture Design (Transformers, SSMs, MoE)Computer Vision Inference & Quantization

Apr 27, 2026

Apr 27, 2026·also UQ

Disagreement as Signals: Dual-view Calibration for Sequential Recommendation Denoising

LLMs can denoise sequential recommendations by disagreeing with the recommendation model itself, leading to more robust performance against noisy user data.

Sijian Li, Min Gao, Zongwei Wang +3

Architecture Design (Transformers, SSMs, MoE)Recommendation & Information Retrieval

Chen Feng +16Apr 27, 2026

FreeScale: Distributed Training for Sequence Recommendation Models with Minimal Scaling Cost

Sequence recommendation models can achieve near-perfect scaling efficiency in distributed training, slashing wasted GPU cycles by up to 90%.

Chen Feng, Haoli Zhang, Sh. B. Ali-zade +14

Distributed Systems & Hardware Recommendation & Information Retrieval Training Efficiency & Optimization

Apr 22, 2026

Apr 22, 2026·also AWS Agentic AI Labs

Supplement Generation Training for Enhancing Agentic Task Performance

Forget fine-tuning behemoth LLMs for every new task – this paper shows how a tiny, nimble model generating smart supplements can unlock surprisingly strong agentic performance from frozen giants.

Young Min Cho, Daniele Bonadiman, Divya Bhargavi +8

Tool Use & Agents Training Efficiency & Optimization

Apr 21, 2026

Apr 21, 2026·also AWS Agentic AI Labs

Explicit Trait Inference for Multi-Agent Coordination

LLM agents can reliably infer each other's "warmth" and "competence" from interaction histories, leading to significantly better coordination in complex multi-agent settings.

Suhaib Abdurahman, Etsuko Ishii, Katerina Margatina +3

Natural Language Processing Reasoning & Chain-of-Thought Tool Use & Agents

Apr 20, 2026

Shenzhen UniversityApr 20, 2026

TLoRA: Task-aware Low Rank Adaptation of Large Language Models

TLoRA achieves superior performance across multiple tasks while cutting down trainable parameters, redefining efficiency in fine-tuning large language models.

Weicheng Lin, Y. Zhang, Jiawei Dang +1

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Apr 20, 2026

NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR

LLM-based ASR can be shrunk to 2.3B parameters and still beat larger models in real-world scenarios by carefully delineating encoder and LLM roles and using a multi-stage training approach.

Jiaqi Song, Guanghui Qiu, Guang Qiu +9

Inference & Quantization Natural Language Processing Scaling Laws & Emergent Abilities+1

Apr 12, 2026

Apr 12, 2026·also unaffiliated

SID-Coord: Coordinating Semantic IDs for ID-based Ranking in Short-Video Search

Dramatically improve short-video search for niche content by unifying memorization and generalization with a lightweight semantic ID framework that boosts long-play rates by +0.664%.

Guowen Li, Yuepeng Zhang, Shunyu Zhang +3

Architecture Design (Transformers, SSMs, MoE)Recommendation & Information Retrieval Training Efficiency & Optimization

Apr 8, 2026

Apr 8, 2026·also unaffiliated

Dual-Rerank: Fusing Causality and Utility for Industrial Generative Reranking

Kuaishou's new Dual-Rerank system slashes latency and boosts user engagement by fusing the best of autoregressive and non-autoregressive generative reranking, proving you can have your cake and eat it too in billion-scale search.

Shuai Lin, ChengLei Dai, Ye Qian +3

Natural Language Processing Recommendation & Information Retrieval

Apr 6, 2026

Yunkai Zhang +7Apr 6, 2026

Grid2Matrix: Revealing Digital Agnosia in Vision-Language Models

VLMs suffer from "digital agnosia," exhibiting a surprisingly sharp failure to transcribe even small color grids into matrices, revealing a critical gap between visual feature encoding and language generation.

Yunkai Zhang, Linda Li, Yin Cui +5

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Apr 3, 2026

Xingtong Ge +6Apr 3, 2026

Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation

Real-time video generation gets a boost: Salt achieves sharper, more dynamic videos at extremely low inference budgets by explicitly enforcing consistency across denoising steps.

Xingtong Ge, Y. Zhang, Yushi Huang +4

Computer Vision Inference & Quantization Training Efficiency & Optimization

Mar 16, 2026

MiroMind Team S. Bai +33Mar 16, 2026·also CAS

MiroThinker-1.7&H1: Towards Heavy-Duty Research Agents via Verification

By verifying its reasoning steps both locally and globally, MiroThinker-H1 achieves state-of-the-art performance in complex research tasks, demonstrating the power of integrated verification for reliable multi-step problem solving.

MiroMind Team S. Bai, L. Bing, L. Lei +31

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Tool Use & Agents

Mar 16, 2026·also State Key Laboratory of Novel Software

SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?

Most "agent skills" hyped for boosting LLMs in software engineering provide almost no benefit in real-world tasks, with 80% yielding zero pass-rate improvement.

Tingxu Han, Y. Zhang, Wei Song +4

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Mar 16, 2026·also Cohere, Moonshot, Xidian

Attention Residuals

Forget fixed residual connections: Attention Residuals let each layer selectively attend to previous layers, boosting performance and gradient flow in deep LLMs.

Kimi Team, Jianlin Su, Weixin Xu +26

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization