Jiashu Yao

Beijing Institute of Technology

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

RLHF & Preference Learning (3)Natural Language Processing (2)Reasoning & Chain-of-Thought (1)Training Efficiency & Optimization (1)

Frequent co-authors

Zeming Liu (4)Yuhang Guo (4)Yingyu Shan (2)Silin Li (1)

Papers (5)

Jun 17, 2026

3w ago·also Baidu, Beihang

PEC-Home: Interpretation of Progressively Elliptical Commands in Smart Homes

Existing smart home assistants miss user intentions in 60% of cases when interpreting elliptical commands, revealing a critical gap in their design.

Yingyu Shan, Zeming Liu, Silin Li +4

Natural Language Processing

3w ago·also Beihang, Independent Researcher

Learning from Own Solutions: Self-Conditioned Credit Assignment for Reinforcement Learning with Verifiable Rewards

Self-conditioning on verified trajectories boosts reinforcement learning performance by over 8%, revealing the power of internal feedback in credit assignment.

Yingyu Shan, Yuhang Guo, Zihao Cheng +6

Reasoning & Chain-of-Thought RLHF & Preference Learning

Apr 13, 2026

Apr 13, 2026·also Beihang, ZJU

Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization

Forget monolithic policies – splitting your LLM's RL policy into accuracy-focused and exploration-driven modes unlocks better performance and diversity.

Jiashu Yao, Chuwei Luo, Daiqing Wu +3

Natural Language Processing RLHF & Preference Learning Training Efficiency & Optimization

Apr 13, 2026·also Beihang

Utilizing and Calibrating Hindsight Process Rewards via Reinforcement with Mutual Information Self-Evaluation

Open-source 7B LLMs can now rival GPT-4o performance on validation tasks, thanks to a novel reinforcement learning approach that leverages calibrated self-evaluation as a dense reward signal.

Jiashu Yao, Zeming Liu, Yuhang Guo

RLHF & Preference Learning Tool Use & Agents

Mar 17, 2026

NVIDIAMar 17, 2026·also BIT, ByteDance, Tencent AI, Vipshop

HierarchicalKV: A GPU Hash Table with Cache Semantics for Continuous Online Embedding Storage

Stop wasting precious GPU memory: this new cache-semantic hash table library achieves up to 3.9 billion key-value lookups per second, outperforming standard approaches by up to 9.4x.

Haidong Rong, Jiashu Yao, Matthias Langer +11

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Search

Jiashu Yao

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (5)