Search papers, labs, and topics across Lattice.
Apple's machine learning research division. Focuses on on-device ML, privacy-preserving AI, and multimodal models.
18
0
0
Outlier tokens in Diffusion Transformers aren't just extreme values; they corrupt local patch semantics, and can be tamed with Dual-Stage Registers to boost image generation quality.
Watermarks meant to identify AI-generated images can be easily removed or forged, even allowing attackers to falsely flag real images as AI-generated.
Training a smaller LLM on a carefully pruned dataset lets it memorize as many facts as a model 10x larger trained on everything.
LLM agents automating productivity tasks achieve only moderate success (39-64%) while exhibiting surprisingly high rates of unsafe actions (7-33%) in realistic, multi-service workflows.
Forget full KV caches: randomly routing attention across layers during training lets you drastically cut memory without hurting performance, and sometimes even helps.
Realistic user simulation is now possible: Pare offers a framework that moves beyond flat tool-calling APIs to model stateful user interactions, enabling better evaluation of proactive agents.
Forget painstakingly collecting user data – PersonaTrace lets you bootstrap realistic digital footprints with LLMs, and models trained on this synthetic data actually generalize better to real-world tasks.
Finally, a single model that can generate both your face and voice, convincingly controlled by text prompts and reference clips.
Text-to-video generation gets a 1.58x speed boost with CalibAtt, a training-free method that exploits consistent sparsity patterns in attention layers.
See in the dark: Dark3R unlocks structure from motion at signal-to-noise ratios below -4dB, where existing methods completely break down.
LLM agents can learn to solve tasks previously beyond their reach by exploring high-level language strategies instead of low-level actions, leading to more efficient and effective reinforcement learning.
Ditch slow, external segmentation pipelines: TrajTok learns trajectory tokens end-to-end, boosting video understanding while staying lean and adaptable.
Fine-tuning a specialized LLM to generate textual relevance labels for search ranking not only beats larger pre-trained models, but also drives significant real-world gains in App Store conversion rates, especially for tail queries.
Sticking to a single HTML-to-text extractor in your LLM pretraining pipeline could be leaving 71% of the data on the table.
A low-cost, portable e-waste sorter achieves high precision (90%) using a YOLOx model, promising to boost material recovery rates in recycling.
Just 20% of a strong model's chain-of-thought can unlock a weaker model's reasoning abilities, revealing the surprising transferability of CoT mechanics.
Key contribution not extracted.
RL fine-tuning can make vision-language models *less* reliable reasoners, as gains in benchmark accuracy come at the cost of faithfulness to the underlying visual grounding and chain-of-thought.