Ameet Talwalkar

Papers on Lattice

Total citations

Topics

h-index

Research focus

Eval Frameworks & Benchmarks (2)Multimodal Models (1)Natural Language Processing (1)Code Generation & Program Synthesis (1)Tool Use & Agents (1)

Frequent co-authors

Stephan Xie (1)Ben Cohen (1)Mononito Goswami (1)Junhong Shen (1)

Papers (2)

Apr 23, 2026

CMU MLApr 23, 2026·also Datadog

ARFBench: Benchmarking Time Series Question Answering Ability for Software Incident Response

Even GPT-5 only achieves 63% accuracy on time series anomaly questions from real software incidents, but a model-expert combination reaches 87%, highlighting the potential for hybrid intelligence in incident response.

Stephan Xie, Ben Cohen, Mononito Goswami +6

Eval Frameworks & Benchmarks Multimodal Models Natural Language Processing

Feb 11, 2026

CMU MLFeb 11, 2026·also Princeton

GameDevBench: Evaluating Agentic Capabilities Through Game Development

Multimodal agents still struggle with game development, solving only ~50% of tasks in a new benchmark, GameDevBench, highlighting the need for better multimodal reasoning in complex software environments.

Wayne Chi, Wayne Chi, Yixiong Fang +15

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Search

Ameet Talwalkar

Research focus

Frequent co-authors

Papers (2)