David Simchi-Levi

Institute for Data, Systems, and Society, School of Business, The Division of Physics, Mathematics and Astronomy, Department of Computing and Mathematical Sciences, School of Industrial and Systems Engineering, Department of Civil and Environmental Engineering, Operations Research Center, Massachusetts Institute of Technology, Purdue University, California Institute of Technology, Georgia Institute of Technology

MIT CSAIL

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

RLHF & Preference Learning (2)Natural Language Processing (1)Scalable Oversight & Alignment Theory (1)Training Efficiency & Optimization (1)

Frequent co-authors

Shuze Daniel Liu (1)Claire Chen (1)Jiabao Sean Xiao (1)Xin Chen (1)

Papers (3)

Jul 7, 2026

MIT CSAIL6d ago·also Caltech, Department of Civil and Environmental, Department of Computing and Mathematical, Georgia Tech +7

Strategic Bargaining in Multi-Buyer Markets: Reinforcement Learning from Verifiable Rewards for LLM Negotiations

LLMs can be trained to negotiate like expert agents, extracting significantly higher surpluses by strategically exploring buyer markets rather than fixating on immediate bids.

Shuze Daniel Liu, Claire Chen, Jiabao Sean Xiao +2

Natural Language Processing RLHF & Preference Learning

Jun 30, 2026

Laboratory for Information and Decision1w ago·also MIT CSAIL, Caltech, Department of Civil and Environmental, Department of Computing and Mathematical +8

Transformers as Bayesian In-Context Experimenters: Smoothness-Adaptive Efficient ATE Estimation

Transformers can effectively mimic Bayesian updating processes to achieve oracle-level efficiency in average treatment effect estimation, outperforming conventional methods.

Jiachun Li, David Simchi-Levi

Scalable Oversight & Alignment Theory Training Efficiency & Optimization

Mar 31, 2026

MIT CSAILMar 31, 2026·also Caltech, Department of Civil and Environmental, Department of Computing and Mathematical, Georgia Tech +7

ShapE-GRPO: Shapley-Enhanced Reward Allocation for Multi-Candidate LLM Training

Stop rewarding all LLM-generated candidates equally: ShapE-GRPO uses Shapley values to fairly distribute credit within sets, leading to better training and faster convergence.

Rui Ai, David Simchi-Levi, Chonghuan Wang

Recommendation & Information Retrieval RLHF & Preference Learning Tool Use & Agents

Search

David Simchi-Levi

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (3)