Search papers, labs, and topics across Lattice.
Apple's machine learning research division. Focuses on on-device ML, privacy-preserving AI, and multimodal models.
15
0
0
LVLMs can be made significantly less prone to hallucinations, without any training, by explicitly grounding them in visual evidence and iteratively self-refining their answers based on verified information.
Forget painstakingly collecting user data – PersonaTrace lets you bootstrap realistic digital footprints with LLMs, and models trained on this synthetic data actually generalize better to real-world tasks.
Finally, a single model that can generate both your face and voice, convincingly controlled by text prompts and reference clips.
Text-to-video generation gets a 1.58x speed boost with CalibAtt, a training-free method that exploits consistent sparsity patterns in attention layers.
See in the dark: Dark3R unlocks structure from motion at signal-to-noise ratios below -4dB, where existing methods completely break down.
LLM agents can learn to solve tasks previously beyond their reach by exploring high-level language strategies instead of low-level actions, leading to more efficient and effective reinforcement learning.
Fine-tuning a specialized LLM to generate textual relevance labels for search ranking not only beats larger pre-trained models, but also drives significant real-world gains in App Store conversion rates, especially for tail queries.
Ditch slow, external segmentation pipelines: TrajTok learns trajectory tokens end-to-end, boosting video understanding while staying lean and adaptable.
Tri-modal masked diffusion models can now be trained from scratch, achieving strong results in text generation, text-to-image, and text-to-speech, thanks to a systematic exploration of the design space and a novel SDE-based batch size reparameterization.
Sticking to a single HTML-to-text extractor in your LLM pretraining pipeline could be leaving 71% of the data on the table.
A low-cost, portable e-waste sorter achieves high precision (90%) using a YOLOx model, promising to boost material recovery rates in recycling.
Just 20% of a strong model's chain-of-thought can unlock a weaker model's reasoning abilities, revealing the surprising transferability of CoT mechanics.
Key contribution not extracted.
RL fine-tuning can make vision-language models *less* reliable reasoners, as gains in benchmark accuracy come at the cost of faithfulness to the underlying visual grounding and chain-of-thought.
Forget algorithmic flaws: the real reliability bottleneck for open-source LLMs lies in the fragile deployment stack, not the model architecture itself.