Lattice AI Research

Research focus

RLHF & Preference Learning (2)Natural Language Processing (1)Tool Use & Agents (1)Training Efficiency & Optimization (1)

Frequent co-authors

Chenxiao Zhao (2)Lei Huang (1)Xiang Cheng (1)Xiang Cheng (1)

Papers (2)

Mar 4, 2026

Mar 4, 2026·also ZJU

Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning

Unlock 2x faster reinforcement learning by distilling group feedback into actionable language refinements that guide exploration.

Lei Huang, Xiang Cheng, Xiang Cheng +10

Natural Language Processing RLHF & Preference Learning Tool Use & Agents

Feb 11, 2026

Guobin Shen +4Feb 11, 2026

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

VESPO stabilizes off-policy RL training for LLMs by directly reshaping sequence-level importance weights, tolerating 64x policy staleness and asynchronous execution without collapse.

Guobin Shen, Chenxiao Zhao, Xiang Cheng +2

RLHF & Preference Learning Training Efficiency & Optimization

Search

Guobin Shen

Research focus

Frequent co-authors

Papers (2)