Apple ML Research

Outlier tokens in Diffusion Transformers aren't just extreme values; they corrupt local patch semantics, and can be tamed with Dual-Stage Registers to boost image generation quality.

Xiaoyu Wu, Yifei Wang, Tsu-Jui Fu +3

Architecture Design (Transformers, SSMs, MoE)Computer Vision

Apr 13, 2026

Apple MLApr 13, 2026

On the Robustness of Watermarking for Autoregressive Image Generation

Watermarks meant to identify AI-generated images can be easily removed or forged, even allowing attackers to falsely flag real images as AI-generated.

Andreas Müller, A. Muller, Denis Lukovnikov +10

Computer Vision Data Curation & Synthetic Data Red-Teaming & Adversarial Robustness

Apr 9, 2026

Apple MLApr 9, 2026

Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts

Training a smaller LLM on a carefully pruned dataset lets it memorize as many facts as a model 10x larger trained on everything.

Jiayuan Ye, Jiayu Ye, Vitaly Feldman +3

Data Curation & Synthetic Data Inference & Quantization Training Efficiency & Optimization

Apr 6, 2026

Apple MLApr 6, 2026

ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces

LLM agents automating productivity tasks achieve only moderate success (39-64%) while exhibiting surprisingly high rates of unsafe actions (7-33%) in realistic, multi-service workflows.

Xiangyi Li, K. Choe, Yiming Liu +12

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

Apr 3, 2026

Apple MLApr 3, 2026

Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing

Forget full KV caches: randomly routing attention across layers during training lets you drastically cut memory without hurting performance, and sometimes even helps.

Anastasiia Filippova, David Grangier, Marco Cuturi +1

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Apr 1, 2026

Apple MLApr 1, 2026·also UCSB

Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants

Realistic user simulation is now possible: Pare offers a framework that moves beyond flat tool-calling APIs to model stateful user interactions, enabling better evaluation of proactive agents.

Deepak Nathani, Chang Huan, Jiaming Shan +7

Eval Frameworks & Benchmarks Tool Use & Agents World Models & Planning

Mar 12, 2026

Apple MLMar 12, 2026·also NUS

PersonaTrace: Synthesizing Realistic Digital Footprints with LLM Agents

Forget painstakingly collecting user data – PersonaTrace lets you bootstrap realistic digital footprints with LLMs, and models trained on this synthetic data actually generalize better to real-world tasks.

Yunfeng Wang, Qifan Guo, Benliang Wang +1

Data Curation & Synthetic Data Natural Language Processing Tool Use & Agents

Mar 10, 2026

Apple MLMar 10, 2026

ID-LoRA: Identity-Driven Audio-Video Personalization with In-Context LoRA

Finally, a single model that can generate both your face and voice, convincingly controlled by text prompts and reference clips.

Aviad Dahan, Moran Yanuka, Noa Kraicer +2

Computer Vision Multimodal Models Speech & Audio

Mar 5, 2026

Apple MLMar 5, 2026·also Tel-Aviv Univercity

Accelerating Text-to-Video Generation with Calibrated Sparse Attention

Text-to-video generation gets a 1.58x speed boost with CalibAtt, a training-free method that exploits consistent sparsity patterns in attention layers.

Shai Yehezkel, Shahar Yadin, S. Yadin +4

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Google ResearchMar 5, 2026·also Apple ML, UW

Dark3R: Learning Structure from Motion in the Dark

See in the dark: Dark3R unlocks structure from motion at signal-to-noise ratios below -4dB, where existing methods completely break down.

Andrew Y Guo, SaiKiran Tedla, Kyros Kutulakos

Computer Vision Robotics & Embodied AI Training Efficiency & Optimization

Mar 2, 2026

Apple MLMar 2, 2026

Expanding LLM Agent Boundaries with Strategy-Guided Exploration

LLM agents can learn to solve tasks previously beyond their reach by exploring high-level language strategies instead of low-level actions, leading to more efficient and effective reinforcement learning.

Andrew Szot, Michael Kirchhof, Omar Attia +1

RLHF & Preference Learning Tool Use & Agents World Models & Planning

Feb 26, 2026

Apple MLFeb 26, 2026·also IEEE, Paul G. Allen School of Computer Science

TrajTok: Learning Trajectory Tokens enables better Video Understanding

Ditch slow, external segmentation pipelines: TrajTok learns trajectory tokens end-to-end, boosting video understanding while staying lean and adaptable.

Chenhao Zheng, Jieyu Zhang, Jianing Zhang +8

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Apple MLFeb 26, 2026

Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments

Fine-tuning a specialized LLM to generate textual relevance labels for search ranking not only beats larger pre-trained models, but also drives significant real-world gains in App Store conversion rates, especially for tail queries.

Evangelia Christakopoulou, Evangelia Christakopoulou, Vivekkumar Patel +4

Data Curation & Synthetic Data Natural Language Processing Recommendation & Information Retrieval

Feb 23, 2026

Stanford HAIFeb 23, 2026·also Apple ML, Google Research, Ant Group, UofT

Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining

Sticking to a single HTML-to-text extractor in your LLM pretraining pipeline could be leaving 71% of the data on the table.

Jeffrey Li, Jeffrey Li, Josh Gardner +19

Data Curation & Synthetic Data Natural Language Processing

Feb 19, 2026

Apple MLFeb 19, 2026

A.R.I.S.: Automated Recycling Identification System for E-Waste Classification Using Deep Learning

A low-cost, portable e-waste sorter achieves high precision (90%) using a YOLOx model, promising to boost material recovery rates in recycling.

Dhruv Talwar, Harsh Desai, Wendong Yin +2

Computer Vision Robotics & Embodied AI Training Efficiency & Optimization

Feb 16, 2026

Apple MLFeb 16, 2026

The Potential of CoT for Reasoning: A Closer Look at Trace Dynamics

Just 20% of a strong model's chain-of-thought can unlock a weaker model's reasoning abilities, revealing the surprising transferability of CoT mechanics.

Gregor Bachmann, Seyed Mohsen Moosavi Dezfooli, Moin Nabi

Eval Frameworks & Benchmarks Interpretability & Mechanistic Interp Reasoning & Chain-of-Thought

Apple MLFeb 16, 2026·also EPFL

Goldilocks RL: Tuning Task Difficulty to Escape Sparse Rewards for Reasoning

Key contribution not extracted.

Ilia Mahrooghi, Aryo Lotfi, Emmanuel Abbe

Reasoning & Chain-of-Thought RLHF & Preference Learning Training Efficiency & Optimization

Feb 13, 2026

Apple MLFeb 13, 2026

On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs

RL fine-tuning can make vision-language models *less* reliable reasoners, as gains in benchmark accuracy come at the cost of faithfulness to the underlying visual grounding and chain-of-thought.

Anshul Shah, Xiaoyu Zhu, Xinke Deng +3

Multimodal Models Reasoning & Chain-of-Thought Red-Teaming & Adversarial Robustness