Ruslan Salakhutdinov

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Eval Frameworks & Benchmarks (4)Tool Use & Agents (4)RLHF & Preference Learning (2)Scaling Laws & Emergent Abilities (2)

Frequent co-authors

L. Jang (3)Lawrence Keunho Jang (3)Jing Yu Koh (3)Zhengzhong Liu (2)

Papers (6)

Jun 16, 2026

CMU ML3w ago·also Institute of Foundation Models, MBZUAI, UMich, Zayed University of Artificial

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning

Training on compound reasoning traces yields better generalization than isolated atomic modules, reshaping our understanding of how LLMs can learn to reason.

Lingjing Kong, Guangyi Chen, Martin Q. Ma +8

Reasoning & Chain-of-Thought RLHF & Preference Learning

Jun 15, 2026

CMU MLJun 15, 2026

MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents

Claude Opus 4.6 outperforms its peers, solving over half of the complex tasks in a personalized desktop environment, revealing critical gaps in current AI capabilities.

L. Jang, Lawrence Keunho Jang, Andrew Keunwoo Jang +2

Eval Frameworks & Benchmarks Tool Use & Agents

Jun 8, 2026

CMU MLJun 8, 2026

iOSWorld: A Benchmark for Personally Intelligent Phone Agents

iOSWorld reveals that even state-of-the-art models falter in multi-app reasoning, achieving only 37% accuracy, underscoring the complexity of personal context in AI interactions.

L. Jang, Lawrence Keunho Jang, Mareks Woodside +4

Eval Frameworks & Benchmarks Tool Use & Agents World Models & Planning

Apr 27, 2026

CMU MLApr 27, 2026

Odysseys: Benchmarking Web Agents on Realistic Long Horizon Tasks

Today's best web agents are shockingly inefficient, achieving only 1.15% trajectory efficiency on realistic long-horizon tasks, revealing a critical need to move beyond simple success rates.

L. Jang, Lawrence Keunho Jang, Jing Yu Koh +3

Eval Frameworks & Benchmarks Tool Use & Agents

Apr 16, 2026

Apr 16, 2026·also CMU ML

Scaling Test-Time Compute for Agentic Coding

Agentic coding gets a serious boost: distilling and reusing rollout trajectories lets Claude-4.5-Opus jump from 70.9% to 77.6% on SWE-Bench Verified.

Joongwon Kim, Wannan Yang, Kelvin Niu +13

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Scaling Laws & Emergent Abilities+1

Mar 12, 2026

CMU MLMar 12, 2026·also Institute of Foundation Models, Petuum, WashU

IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL

Forget simple scaling laws: the compute-optimal number of parallel rollouts in LLM RL plateaus, revealing distinct mechanisms for easy vs. hard problems.

Zhoujun Cheng, Yutao Xie, Yuxiao Qu +16

RLHF & Preference Learning Scaling Laws & Emergent Abilities Training Efficiency & Optimization

Search

Ruslan Salakhutdinov

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (6)