Sean Welleck

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Eval Frameworks & Benchmarks (3)Code Generation & Program Synthesis (2)Tool Use & Agents (2)Reasoning & Chain-of-Thought (2)

Frequent co-authors

Pranjal Aggarwal (3)Seungone Kim (2)Anmol Agarwal (1)Natalie Neamtu (1)

Papers (5)

May 26, 2026

Amazon ScienceMay 26, 2026

Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization

LLMs can autoformalize specs well enough to pass standard tests, but still fail on subtle edge cases 26% of the time, a risk missed by LLM-as-judge evaluations.

Anmol Agarwal, Natalie Neamtu, Pranjal Aggarwal +8

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Apr 7, 2026

CMU MLApr 7, 2026

Gym-Anything: Turn any Software into an Agent Environment

Forget toy problems: Gym-Anything lets you turn *any* software into an agent environment, unlocking a world of 10K+ real-world tasks spanning medicine, engineering, and more.

Pranjal Aggarwal, Sean Welleck

Code Generation & Program Synthesis Tool Use & Agents

Mar 19, 2026

Meta AIMar 19, 2026·also CMU ML, CAS, UESTC, UNC +1

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

On-policy reward modeling with LLM judges not only unlocks significant performance gains on complex mathematical reasoning tasks, but also generalizes to improve performance on simpler numerical and multiple-choice benchmarks.

Pranjal Aggarwal, Marjan Ghazvininejad, Seungone Kim +20

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought RLHF & Preference Learning

Mar 18, 2026

Hyun Ryu +4Mar 18, 2026·also KAIST

Argument Reconstruction as Supervision for Critical Thinking in LLMs

Training LLMs to reconstruct arguments boosts their critical thinking abilities across diverse tasks, suggesting a promising new direction for imbuing reasoning skills.

Hyun Ryu, Gyouk Chu, Gregor Betz +2

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

Feb 25, 2026

CMU MLFeb 25, 2026·also Fudan, UBC

GradAlign: Gradient-Aligned Data Selection for LLM Reinforcement Learning

Forget manual curation—aligning policy gradients with a validation set adaptively selects RL training data, leading to more stable LLM training and improved performance.

Ningyuan Yang, Weihua Du, Weiwei Sun +2

Data Curation & Synthetic Data RLHF & Preference Learning Training Efficiency & Optimization

Search

Sean Welleck

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (5)