Search papers, labs, and topics across Lattice.
Google's broad research division. Key contributions include Transformer architecture, BERT, T5, and TensorFlow.
83
80
1
Instead of creating new AI companions from scratch, Deco shows how to breathe new life into cherished physical objects by giving them a digital voice and personality powered by LLMs.
LLMs' persistent hallucinations aren't just about lacking knowledge, but about lacking the self-awareness to know what they *don't* know, suggesting uncertainty expression is key to building trustworthy AI.
Forget handcrafted prompts: a hierarchical multi-agent framework turns diffusion models into coherent storytelling engines by globally optimizing for semantic coherence.
LVLMs can self-detect and correct object hallucinations by focusing on specific image regions, offering a simple, training-free fix.
Stop penalizing your ANN search algorithms for failing to retrieve irrelevant neighbors – Semantic Recall offers a more nuanced and effective way to measure retrieval quality.
Current remote sensing change captioning datasets miss fine-grained localized semantic reasoning, but RSRCC fills this gap with 126k change-specific questions.
GAAP offers a deterministic, trust-minimized approach to AI agent security, safeguarding user data even when models are compromised or prompts are injected.
Multilingual LLMs exhibit a surprising "American bias," even when prompted in other languages, and instruction tuning makes it worse.
Debloating tools, intended to shrink code and improve security, can actually *add* code or remove essential functionality, with dynamic methods being overly aggressive and static methods overly conservative.
FUSE achieves verification quality on par with semi-supervised methods, all without needing any labeled data.
ZKP proving, previously bottlenecked by MSM and NTT operations, can now achieve up to 10x higher throughput on TPUs thanks to a novel framework that reformulates ZKP kernels for AI-ASIC execution.
RosettaSearch recovers up to 68% more structural fidelity in protein designs, transforming how we optimize sequences beyond traditional single-pass methods.
Generating consistent visual narratives is now possible: CANVAS outperforms existing methods by explicitly planning character, background, and scene continuity across multiple shots.
Reconstructing dynamic hand-object interactions from monocular video can be 6x faster and significantly more accurate by ditching heavy neural representations for a revived Sum-of-Gaussians approach.
Ethics interventions in AI development often fail because practitioners don't trust them – here's a breakdown of why, and how to fix it.
Google developers are spending less time debugging integration tests thanks to an LLM that diagnoses failures with 90% accuracy.
Unpacking Google's AI literacy partnerships reveals the surprising complexities of aligning research, industry, and public needs.
Ditch imperative robot programming and embrace the elegance of logic: control swarms with declarative code.
Forget KL divergence – this work shows you *can* reliably evaluate generative models with finite samples, but only if you use the right metric (IPMs with bounded test classes).
LLMs can now generate more relevant and factual movie recommendations by dynamically bridging retrieval and generation with a novel reinforcement learning approach.
Fluent language from an agentic IR system can be dangerously deceptive, masking critical errors in planning, retrieval, reasoning, and execution that accumulate over time.
CGRA performance jumps by 2.7x thanks to NEURA, a compilation framework that elegantly transforms control flow into dataflow.
LLM-powered multi-agent architectures are poised to revolutionize video recommendation by enabling precise, explainable, and adaptive recommendations that surpass the limitations of static, single-model systems.
Activating a single, carefully chosen neuron can be enough to make a language model remember facts about an entity, suggesting a surprisingly localized and efficient knowledge representation.
Safety fine-tuning might inadvertently be stripping LLMs of their ability to understand non-human minds and entertain spiritual beliefs, even while preserving Theory of Mind.
Despite the effort required, Android developers overwhelmingly support platform-level changes to combat fingerprinting, suggesting a path to enhanced user privacy through collaborative platform-developer initiatives.
MLLMs are riddled with shared vulnerabilities across modalities, meaning a single weakness can be exploited to jailbreak safety filters, hijack instructions, or even poison training data.
Achieve world-consistent video generation by directly optimizing geometry in the latent space of pre-trained video diffusion models, sidestepping costly RGB-space operations and architectural changes.
Refining generative models with discriminator guidance provably improves generalization, offering a theoretical justification for techniques like score-based diffusion.
MLLMs are surprisingly prone to hallucinating subtle details, especially when asked about the absence of specific attributes or relationships within an image.
Imagine an XR experience where you can selectively isolate and enhance individual sound sources in real-time, making chaotic audio environments crystal clear.
Dataset condensation, previously limited to neural networks, can now democratize access to clinical data by enabling privacy-preserving training of classical models like decision trees and Cox regression.
Forget catastrophic forgetting: this function-preserving expansion method lets you fine-tune without sacrificing pre-trained knowledge, matching full fine-tuning performance at a fraction of the cost.
Forget local semantic alignment: CAST unlocks temporally coherent video retrieval and generation by explicitly modeling visual state transitions.
LLM-powered diagnostic AI is ready for prime time: a real-world clinical trial shows it's safe, patients love it, and doctors find it useful.
Most social media platforms govern AI-generated content by simply applying existing content moderation policies, leaving key issues like ownership and monetization largely unaddressed.
AI-generated videos can now respect physics, thanks to a framework that uses a physical simulator to guide diffusion models, resulting in more realistic and coherent motion.
Reasoning models are surprisingly bad at controlling their own thoughts: Claude Sonnet 4.5 can control its chain-of-thought only 2.7% of the time, raising questions about the reliability of CoT monitoring.
An AI agent cracked an open problem in theoretical physics, deriving exact analytical solutions for gravitational radiation from cosmic strings, proving AI can do more than just pattern recognition.
Datacenter networks are haunted by "ghosts"—topology knowledge failures due to link flaps that occur every 48 seconds at 2025 cluster scale—and existing mitigations are insufficient, but Open Atomic Ethernet offers a potential exorcism.
Forget quadratic scaling: ZipMap zips entire 3D scenes from hundreds of images into a compact state in a single pass, unlocking 20x faster reconstruction.
DARKFormer closes the performance gap with exact softmax attention in finetuning by learning a data-aligned kernel geometry for efficient random feature approximation, sidestepping the need for retraining or large feature budgets.
Multimodal web agents are surprisingly vulnerable to cross-modal attacks, but a novel adversarial training approach can double task completion efficiency while mitigating these risks.
LLMs are becoming "epistemic agents" that shape our knowledge environment, so we need a new framework for evaluating and governing them based on trustworthiness, not just performance.
Despite dedicated efforts from multiple teams, existing speech systems still fall significantly short of deployment readiness for understanding real-world medical conversations in Indian languages, highlighting the need for further research.
Forget hand-engineered reward functions: this method learns complex exploratory behaviors by simply predicting which states lead to unpredictable futures.
Finally, a framework to quantify AI's cultural intelligence, moving beyond ad-hoc cultural benchmarks to a systematic, extensible, and theoretically grounded approach.
Recurrent models can now achieve Transformer-competitive performance on recall-intensive tasks, thanks to a simple memory caching mechanism that grows memory capacity with sequence length.
State-of-the-art emotion recognition in conversations is now possible by decoupling modality-specific context modeling and multimodal fusion with a mixture-of-experts approach that doesn't require speaker identity.
AI safety evaluations get a much-needed dose of Sub-Saharan African perspectives with the release of SAFARI, a stereotype dataset built using community-engaged methods across 15 native languages.
LLMs harbor surprisingly consistent hidden beliefs on sensitive topics like mass surveillance and torture, even when direct questioning suggests otherwise.
Forget fine-tuning: Prompt-Level Distillation lets small models match frontier reasoning performance by distilling explicit reasoning patterns into structured system prompts.
Gemini 3 Deep Think can now autonomously solve a majority of problems in a challenging math competition, signaling a leap in AI's mathematical reasoning capabilities.
Forget painstakingly labeling audio datasets – AuditoryHuM uses LLMs and targeted human input to automatically generate and cluster high-quality auditory scene labels.
Surprisingly, using only a single inner loop update in data mixing can lead to failure, and the optimal number of inner loop steps scales logarithmically with the parameter update budget.
Ditch Stable Diffusion's latents: Unified Latents (UL) achieves state-of-the-art video generation and competitive image generation with fewer training FLOPs.
Existing deforestation monitoring maps misclassify smallholder agroforestry as "forest," risking unfair penalties under regulations like the EUDR.
LLMs still struggle with infrequently occurring knowledge, and this paper provides a structured framework to understand why, how we can fix it, and what the implications are for responsible AI.
Sequence models can learn to cooperate in multi-agent settings simply by training against diverse partners, no explicit meta-learning required.
Natural privacy filters, despite their promise for tighter privacy accounting, aren't universally "free," limiting their applicability to specific families of differentially private mechanisms.
A new model, TAC, uses synthetic training data to achieve state-of-the-art audio and audio-visual reasoning by generating temporally grounded captions that can then be fed into LLMs.
Randomly masking parameter updates in RMSProp delivers state-of-the-art LLM training performance, revealing a surprisingly effective form of geometric regularization.
Humanoid robots can now perform vision-based parkour, chaining together dynamic skills like climbing, vaulting, and rolling, adapting to real-time obstacle changes.
Forget complex architectures: RaCo achieves SOTA keypoint matching and repeatability by cleverly combining ranking and covariance estimation in a lightweight network, trained without covisible image pairs.
LLMs like GPT-5 and Gemini-3 already "know" almost everything (95-98% factual encoding), but struggle to recall it, suggesting that future gains in factuality depend more on better memory retrieval than on simply scaling up.
Forget hand-crafted reward functions: CM2 uses checklists to train tool-using agents, outperforming SFT baselines by up to 12 points on key benchmarks.
Speech recognition models stumble badly on real-world street names, especially for non-English speakers, but a simple synthetic data boost can dramatically improve accuracy.
Finally, a streaming ASR model matches Whisper's offline transcription quality while maintaining sub-second latency.
Forget huge models: parameter-efficient fine-tuning turns tiny language models into code-generating powerhouses that outperform larger, untuned counterparts.
Finally, a fully open-source, reproducible system for long-form song generation is here, complete with licensed data, code, and a Qwen-based model that rivals closed-source systems.
Forget "smart plagiarism" – multi-stage LLM workflows like recursive decomposition and long-context pipelines can actually generate novel research plans, outperforming simpler reflection-based methods.
Claude 2 can match the performance of top medical specialists on pulmonary thromboembolism knowledge assessments, suggesting AI's potential for clinical decision support.
LLM safety guardrails are far less robust than benchmarks suggest, with accuracy dropping by as much as 57% on novel adversarial attacks, and some even generating harmful content in a "helpful mode" jailbreak.
Despite their promise, even the best multimodal LLM (GPT-4o) achieves only 26% accuracy in grading knee osteoarthritis from radiographs, revealing a significant gap in clinical reliability.
Reasoning-based safety guardrails, once thought to be a strong defense against jailbreaks, crumble with just a few strategically placed tokens.
Clinicians using a new medical literature mining LLM, LEADS, achieved 0.81 recall vs. 0.78 without it, while saving 20.8% of their time.
Even the best LLMs fail more than 40% of the time when orchestrating multiple tools in realistic scenarios, revealing critical gaps in real-world agent capabilities.
DPO's success isn't just clever engineering—it's deeply rooted in human choice theory, unlocking a surprisingly flexible framework for preference optimization and justifying many DPO extensions.
Deep learning finally cracks the DFT accuracy-efficiency trade-off, enabling highly accurate quantum chemistry calculations at semi-local DFT cost.
Forget sparse autoencoders: semi-nonnegative matrix factorization directly dissects MLP activations into human-interpretable features that causally steer LLMs better.
Forget hand-annotated data: Magnet distills multi-turn tool-use skills into LLMs by automatically generating training trajectories that outperform even Gemini 1.5 Pro.
LLMs can generate plain language summaries of scientific research that are as good as human-written ones, but easier to read.
Clinicians using a medical literature-specific foundation model, LEADS, achieved 23-27% time savings and improved accuracy/recall compared to working alone.