Apple ML Research

×Natural Language Processing

5 papers from Apple ML Research on Natural Language Processing

Jun 10, 2026

Apple MLJun 10, 2026·also UPF

On The Effectiveness-Fluency Trade-Off In LLM Conditioning: A Systematic Study

Efficient conditioning methods for LLMs often sacrifice fluency, revealing a critical trade-off that could reshape deployment strategies.

Iuri Macocco, Pau Rodríguez, Arno Blaas +2

Eval Frameworks & Benchmarks Natural Language Processing

Apr 29, 2026

Apple MLApr 29, 2026·also CMU ML, UCSB, UW-Madison

Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling

Forget coarse sequence-level hacks: LenVM lets you precisely dial in token generation length, boosting a 7B model's length accuracy from 30.9 to 64.8 and crushing closed-source rivals.

Zhen Zhang, Changyi Yang, Zijie Xia +13

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Natural Language Processing

Mar 12, 2026

Apple MLMar 12, 2026·also NUS

PersonaTrace: Synthesizing Realistic Digital Footprints with LLM Agents

Forget painstakingly collecting user data – PersonaTrace lets you bootstrap realistic digital footprints with LLMs, and models trained on this synthetic data actually generalize better to real-world tasks.

Yunfeng Wang, Qifan Guo, Benliang Wang +1

Data Curation & Synthetic Data Natural Language Processing Tool Use & Agents

Feb 26, 2026

Apple MLFeb 26, 2026

Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments

Fine-tuning a specialized LLM to generate textual relevance labels for search ranking not only beats larger pre-trained models, but also drives significant real-world gains in App Store conversion rates, especially for tail queries.

Evangelia Christakopoulou, Evangelia Christakopoulou, Vivekkumar Patel +4

Data Curation & Synthetic Data Natural Language Processing Recommendation & Information Retrieval

Feb 23, 2026

Stanford HAIFeb 23, 2026·also Apple ML, Google Research, Ant Group, UofT

Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining

Sticking to a single HTML-to-text extractor in your LLM pretraining pipeline could be leaving 71% of the data on the table.

Jeffrey Li, Jeffrey Li, Josh Gardner +18

Data Curation & Synthetic Data Natural Language Processing