Hanna Hajishirzi

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Eval Frameworks & Benchmarks (2)Tool Use & Agents (2)Data Curation & Synthetic Data (2)Natural Language Processing (2)

Frequent co-authors

Nathan Lambert (2)Noah A. Smith (2)Joongwon Kim (1)Wannan Yang (1)

Papers (5)

Apr 16, 2026

1w ago·also CMU ML

Scaling Test-Time Compute for Agentic Coding

Agentic coding gets a serious boost: representing rollouts as structured summaries and then recursively comparing them lets Claude-4.5-Opus jump from 70.9% to 77.6% on SWE-Bench Verified.

Joongwon Kim, Wannan Yang, Kelvin Niu +13

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Scaling Laws & Emergent Abilities+1

Mar 11, 2026

AI2Mar 11, 2026·also UW

Meta-Reinforcement Learning with Self-Reflection for Agentic Search

Agentic search gets a meta-RL boost: MR-Search learns to self-reflect and adapt search strategies across episodes, significantly outperforming standard RL baselines.

Teng Xiao, Yige Yuan, Hamish Ivison +6

Recommendation & Information Retrieval Tool Use & Agents World Models & Planning

Feb 22, 2026

UWFeb 22, 2026·also Cornell

Learning to Detect Language Model Training Data via Active Reconstruction

Forget passively analyzing model outputs – this new attack actively *trains* the model to regurgitate specific texts, revealing its training data with surprising accuracy.

Junjie Oscar Yin, John X. Morris, John X. Morris +6

Data Curation & Synthetic Data Natural Language Processing Red-Teaming & Adversarial Robustness

Feb 12, 2026

AI2Feb 12, 2026·also Stanford HAI

Olmix: A Framework for Data Mixing Throughout LM Development

By reusing existing data mixture ratios and only recomputing for affected domains, Olmix slashes compute costs by 74% without sacrificing downstream task performance during iterative LM development.

Tyler C. Murray, David Heineman, Matt Jordan +4

Data Curation & Synthetic Data Natural Language Processing Training Efficiency & Optimization

Jun 2, 2025

AI2Jun 2, 2025·also UW

RewardBench 2: Advancing Reward Model Evaluation

RewardBench 2 exposes a stark reality check for reward models: they struggle significantly on new, human-generated prompts, yet this difficulty is surprisingly predictive of their actual usefulness in downstream tasks.

Saumya Malik, Valentina Pyatkin, Sander Land +453

Eval Frameworks & Benchmarks RLHF & Preference Learning