Lattice AI Research

Research focus

RLHF & Preference Learning (2)Natural Language Processing (1)Reasoning & Chain-of-Thought (1)Code Generation & Program Synthesis (1)Data Curation & Synthetic Data (1)

Frequent co-authors

Zhengxu Hou (1)Yangshijie Zhang (1)Bingren Yan (1)Jialin Liu (1)

Papers (2)

Apr 13, 2026

DAMOApr 13, 2026·also HFUT, PKU

Triviality Corrected Endogenous Reward

Unsupervised RL for text generation doesn't have to collapse into gibberish: rewarding relative information gain between specialist and generalist policies unlocks meaningful content creation.

Xinda Wang, Zhengxu Hou, Yangshijie Zhang +4

Natural Language Processing Reasoning & Chain-of-Thought RLHF & Preference Learning

Feb 15, 2026

Feb 15, 2026·also Microsoft Research

From SFT to RL: Demystifying the Post-Training Pipeline for LLM-based Vulnerability Detection

On-policy RL (GRPO) makes LLMs significantly better at vulnerability detection than SFT or preference optimization, outperforming even strong zero-shot baselines.

Youpeng Li, Fuxun Yu, Xinda Wang

Code Generation & Program Synthesis Data Curation & Synthetic Data RLHF & Preference Learning

Search

Xinda Wang

Research focus

Frequent co-authors

Papers (2)