Allen Institute for AI (AI2)

Non-profit research institute founded by Paul Allen. Known for Semantic Scholar, OLMo, and AI for science.

allenai.org

Total papers

Total citations

120

Avg citations

Top Researchers

Ali FarhadiYejin ChoiHanna Hajishirzi

Recent Papers

Feb 12, 2026

Allen Institute for Artificial2d ago·affiliated labs: Allen Institute for AI (AI2), Stanford HAI

Olmix: A Framework for Data Mixing Throughout LM Development

The paper introduces Olmix, a framework designed to address challenges in data mixing for language model training, specifically focusing on understanding the configuration space of mixing methods and efficiently adapting to evolving domain sets. Through an empirical study, the authors identify key design choices for effective mixing methods and propose "mixture reuse," a technique that leverages past mixture ratios to efficiently recompute mixtures after domain set updates. Experiments show that mixture reuse achieves comparable performance to full recomputation with significantly reduced compute (74% less) and outperforms training without mixing by 11.6% on downstream tasks.

Introduces and validates "mixture reuse," a novel technique for efficiently adapting data mixtures in language model training when the domain set evolves.

Tyler C. Murray, David Heineman, Matt Jordan +42602.12237

Data Curation & Synthetic DataTraining Efficiency & OptimizationNatural Language Processing

Feb 11, 2026

3d ago

MolmoSpaces: A Large-Scale Open Ecosystem for Robot Navigation and Manipulation

The authors introduce MolmoSpaces, a large-scale, open-source ecosystem comprising over 230k diverse indoor environments and 130k richly annotated object assets, designed to address the limitations of existing robot benchmarks in capturing the long tail of real-world scenarios. This simulator-agnostic ecosystem supports a wide range of embodied tasks, including navigation, manipulation, and long-horizon planning, and includes MolmoSpaces-Bench, a benchmark suite of 8 tasks. Experiments demonstrate strong sim-to-real correlation and highlight sensitivities to factors like prompt phrasing and camera occlusion, establishing MolmoSpaces as a valuable resource for scalable robot learning research.

Introduces a large-scale, simulator-agnostic, and open-source ecosystem for robot learning, featuring diverse indoor environments and richly annotated objects, to facilitate more robust and generalizable robot policies.

Wilbert Pumacay, Omar Rayyan, Max Argus +192602.11337

Robotics & Embodied AIEval Frameworks & BenchmarksWorld Models & Planning

Jan 28, 2026

2w ago

SERA: Soft-Verified Efficient Repository Agents

The paper introduces Soft-Verified Efficient Repository Agents (SERA), a supervised finetuning method for efficiently training coding agents specialized to private codebases. SERA leverages Soft Verified Generation (SVG) to create thousands of synthetic trajectories from a single repository, enabling rapid and cost-effective specialization. The resulting SERA models achieve state-of-the-art performance among fully open-source models, matching the performance of models like Devstral-Small-2 at a fraction of the cost compared to reinforcement learning or previous synthetic data methods.

Introduces Soft Verified Generation (SVG), a novel method for generating synthetic code trajectories that enables efficient supervised finetuning of coding agents specialized to private codebases.

Ethan Shen, Danny Tormoen, Saurabh Shah +22601.20789

Code Generation & Program SynthesisTraining Efficiency & OptimizationOpen-Source Models & Weights

Aug 11, 2025

MolmoAct: Action Reasoning Models that can Reason in Space

The authors introduce Action Reasoning Models (ARMs) for robotics, which integrate perception, planning, and control in a three-stage pipeline to improve adaptability and grounding. They present MolmoAct, a 7B parameter ARM, that encodes observations and instructions into depth-aware perception tokens, generates spatial plans as trajectory traces, and predicts low-level actions. MolmoAct achieves state-of-the-art performance in simulation and real-world settings, demonstrating improved zero-shot accuracy, long-horizon task success, and out-of-distribution generalization compared to existing models like Pi-0 and ThinkAct.

Introduces Action Reasoning Models (ARMs), a novel class of robotic foundation models that explicitly incorporate spatial planning as an intermediate reasoning step between perception and action.

Jason Lee, Jiafei Duan, Haoquan Fang +16662508.07917

Robotics & Embodied AIWorld Models & PlanningReasoning & Chain-of-Thought

Jun 2, 2025

RewardBench 2: Advancing Reward Model Evaluation

The paper introduces RewardBench 2, a new benchmark for evaluating reward models across multiple skills, featuring challenging data derived from novel human prompts. It addresses the gap between reward model evaluation and their effectiveness in downstream tasks by providing a more rigorous assessment of reward model accuracy. The benchmark demonstrates a strong correlation with downstream performance in both inference-time scaling and RLHF training, while showing a significant performance drop compared to the original RewardBench.

Introduces a novel multi-skill reward modeling benchmark, RewardBench 2, using new human prompts to improve the rigor and relevance of reward model evaluation.

Saumya Malik, Valentina Pyatkin, Sander Land +4532506.01937

RLHF & Preference LearningEval Frameworks & Benchmarks

Lattice is designed for desktop

Allen Institute for AI (AI2)

Top Researchers

Recent Papers

Search