Search papers, labs, and topics across Lattice.
Google's broad research division. Key contributions include Transformer architecture, BERT, T5, and TensorFlow.
100
1
0
PI-Hunter uncovers hidden prompt injection vulnerabilities in LLM agents that traditional defenses miss, revealing a critical gap in current security practices.
Real-time LLM-generated user personas can dramatically enhance viewer engagement by dynamically balancing existing interests with new content recommendations.
APEX reveals that optimizing data alongside prompts can boost LLM performance by over 11% while significantly reducing wasted compute resources.
LLMs reveal surprising strengths and weaknesses in analyzing security logs, with performance heavily influenced by model design choices.
In high-stakes health contexts, stakeholders demand that trustworthiness in AI systems be inspectable, not just asserted, reshaping how we design health information tools.
Redefining data work through a reparative lens reveals the urgent need to prioritize the voices of those harmed by online systems, challenging existing norms of accountability in AI.
MResOpt achieves significantly lower high-priority constraint violations in constrained optimization tasks while remaining computationally efficient, revolutionizing how we approach complex optimization problems.
A strategic messaging shift on Google Search reduced CSAM-related queries by 3.8%, effectively redirecting some users towards therapeutic resources.
Non-private synthetic data can effectively transfer knowledge from original corpora, while state-of-the-art DP methods often fail to do so, even at high privacy levels.
RLNS turns a classic heuristic into a powerful MCMC sampler, enabling efficient combinatorial optimization without the need for exact solutions.
VLMs struggle with procedural 3D modeling, often producing flawed outputs due to API mismatches and geometric disconnections, but performance can be significantly boosted through iterative refinement.
LLM-powered honeypots can trick even frontier models into longer interactions than rule-based systems, all while costing less to run.
Ditch the brittle code synthesis and noisy gradients: LiveSVG unlocks high-quality SVG animations by directly fitting vector graphics to reference videos generated from motion prompts.
PARCEL redefines visual tokenization, achieving superior efficiency and performance by dynamically anchoring feature extraction to spatial pool tokens.
Humans miss 3.9% of opportunities to leverage correct AI suggestions while also over-relying on misleading outputs, highlighting critical gaps in trust and decision-making in human-AI collaboration.
Imagine telepresence where your avatar convincingly blends into any environment, relit in real-time based on the scene's actual lighting, all from a single headset.
Splitting attention and feedforward networks onto separate GPUs can unlock 4x higher MoE LLM throughput, but only if you carefully tune the GPU partitioning strategy based on the workload.
Why pick just one token mixer when you can have them all, dynamically switching between attention and linear recurrences for optimal efficiency and performance?
LLMs alone can't reliably retrieve actionable data from the web, with agents relying on semantic metadata achieving 65% higher precision in finding FAIR-compliant datasets.
Achieving provable, non-asymptotic guarantees for optimizing complex multi-label metrics like F-measure is now possible with a new family of algorithms that decompose exactly for $O(l)$ time complexity.
Even the best LLM judges miss cultural faux pas that are obvious to locals, achieving only 52% F1 score on a new benchmark.
Gemini Embedding 2's unified multimodal embeddings beat specialized models across diverse tasks and even generalize zero-shot to niche fields like astronomy and culinary arts.
Bandit feedback doesn't have to cripple learning: a new "bandit DS dimension" reveals how to achieve near-optimal sample complexity in multiclass PAC learning, even when you only know if you're right or wrong.
AI-driven scientific discovery is closer than you think, but current systems still struggle with reproducibility, cross-domain robustness, and accountable scientific closure.
You can slash the compute cost of visual geometry transformers by 85% without sacrificing accuracy by intelligently pruning redundant tokens across frames and within layers.
AI can now autonomously solve open math problems, cracking 9 Erdős problems and 44 OEIS conjectures at a reasonable cost.
Graph transformers can be fundamentally limited by their tokenization strategy, as some tokenizations provably preclude efficient learning of structural representations realizable with other tokenizations.
Training a foundation model on a trillion minutes of wearable sensor data unlocks surprisingly accurate predictions across a wide range of health conditions, even with limited labeled data.
Instead of creating new AI companions from scratch, Deco shows how to breathe new life into cherished physical objects by giving them a digital voice and personality powered by LLMs.
LLMs' persistent hallucinations aren't just about lacking knowledge, but about lacking the self-awareness to know what they *don't* know, suggesting uncertainty expression is key to building trustworthy AI.
Forget handcrafted prompts: a hierarchical multi-agent framework turns diffusion models into coherent storytelling engines by globally optimizing for semantic coherence.
Current remote sensing change captioning datasets miss fine-grained localized semantic reasoning, but RSRCC fills this gap with 126k change-specific questions.
Stop penalizing your ANN search algorithms for failing to retrieve irrelevant neighbors – Semantic Recall offers a more nuanced and effective way to measure retrieval quality.
LVLMs can self-detect and correct object hallucinations by focusing on specific image regions, offering a simple, training-free fix.
GAAP offers a deterministic, trust-minimized approach to AI agent security, safeguarding user data even when models are compromised or prompts are injected.
Multilingual LLMs exhibit a surprising "American bias," even when prompted in other languages, and instruction tuning makes it worse.
Debloating tools, intended to shrink code and improve security, can actually *add* code or remove essential functionality, with dynamic methods being overly aggressive and static methods overly conservative.
ZKP proving, previously bottlenecked by MSM and NTT operations, can now achieve up to 10x higher throughput on TPUs thanks to a novel framework that reformulates ZKP kernels for AI-ASIC execution.
FUSE achieves verification quality on par with semi-supervised methods, all without needing any labeled data.
RosettaSearch recovers up to 68% more structural fidelity in protein designs, transforming how we optimize sequences beyond traditional single-pass methods.
Generating consistent visual narratives is now possible: CANVAS outperforms existing methods by explicitly planning character, background, and scene continuity across multiple shots.
Reconstructing dynamic hand-object interactions from monocular video can be 6x faster and significantly more accurate by ditching heavy neural representations for a revived Sum-of-Gaussians approach.
Google developers are spending less time debugging integration tests thanks to an LLM that diagnoses failures with 90% accuracy.
Ethics interventions in AI development often fail because practitioners don't trust them – here's a breakdown of why, and how to fix it.
Ditch imperative robot programming and embrace the elegance of logic: control swarms with declarative code.
Unpacking Google's AI literacy partnerships reveals the surprising complexities of aligning research, industry, and public needs.
Forget KL divergence – this work shows you *can* reliably evaluate generative models with finite samples, but only if you use the right metric (IPMs with bounded test classes).
LLMs can now generate more relevant and factual movie recommendations by dynamically bridging retrieval and generation with a novel reinforcement learning approach.
CGRA performance jumps by 2.7x thanks to NEURA, a compilation framework that elegantly transforms control flow into dataflow.
Fluent language from an agentic IR system can be dangerously deceptive, masking critical errors in planning, retrieval, reasoning, and execution that accumulate over time.
LLM-powered multi-agent architectures are poised to revolutionize video recommendation by enabling precise, explainable, and adaptive recommendations that surpass the limitations of static, single-model systems.
Activating a single, carefully chosen neuron can be enough to make a language model remember facts about an entity, suggesting a surprisingly localized and efficient knowledge representation.
Safety fine-tuning might inadvertently be stripping LLMs of their ability to understand non-human minds and entertain spiritual beliefs, even while preserving Theory of Mind.
MLLMs are riddled with shared vulnerabilities across modalities, meaning a single weakness can be exploited to jailbreak safety filters, hijack instructions, or even poison training data.
Despite the effort required, Android developers overwhelmingly support platform-level changes to combat fingerprinting, suggesting a path to enhanced user privacy through collaborative platform-developer initiatives.
Achieve world-consistent video generation by directly optimizing geometry in the latent space of pre-trained video diffusion models, sidestepping costly RGB-space operations and architectural changes.
Refining generative models with discriminator guidance provably improves generalization, offering a theoretical justification for techniques like score-based diffusion.
MLLMs are surprisingly prone to hallucinating subtle details, especially when asked about the absence of specific attributes or relationships within an image.
Imagine an XR experience where you can selectively isolate and enhance individual sound sources in real-time, making chaotic audio environments crystal clear.
Dataset condensation, previously limited to neural networks, can now democratize access to clinical data by enabling privacy-preserving training of classical models like decision trees and Cox regression.
Forget catastrophic forgetting: this function-preserving expansion method lets you fine-tune without sacrificing pre-trained knowledge, matching full fine-tuning performance at a fraction of the cost.
LLM-powered diagnostic AI is ready for prime time: a real-world clinical trial shows it's safe, patients love it, and doctors find it useful.
Forget local semantic alignment: CAST unlocks temporally coherent video retrieval and generation by explicitly modeling visual state transitions.
Most social media platforms govern AI-generated content by simply applying existing content moderation policies, leaving key issues like ownership and monetization largely unaddressed.
AI-generated videos can now respect physics, thanks to a framework that uses a physical simulator to guide diffusion models, resulting in more realistic and coherent motion.
Reasoning models are surprisingly bad at controlling their own thoughts: Claude Sonnet 4.5 can control its chain-of-thought only 2.7% of the time, raising questions about the reliability of CoT monitoring.
An AI agent cracked an open problem in theoretical physics, deriving exact analytical solutions for gravitational radiation from cosmic strings, proving AI can do more than just pattern recognition.
Datacenter networks are haunted by "ghosts"—topology knowledge failures due to link flaps that occur every 48 seconds at 2025 cluster scale—and existing mitigations are insufficient, but Open Atomic Ethernet offers a potential exorcism.
Multimodal web agents are surprisingly vulnerable to cross-modal attacks, but a novel adversarial training approach can double task completion efficiency while mitigating these risks.
Forget quadratic scaling: ZipMap zips entire 3D scenes from hundreds of images into a compact state in a single pass, unlocking 20x faster reconstruction.
DARKFormer closes the performance gap with exact softmax attention in finetuning by learning a data-aligned kernel geometry for efficient random feature approximation, sidestepping the need for retraining or large feature budgets.
LLMs are becoming "epistemic agents" that shape our knowledge environment, so we need a new framework for evaluating and governing them based on trustworthiness, not just performance.
Despite dedicated efforts from multiple teams, existing speech systems still fall significantly short of deployment readiness for understanding real-world medical conversations in Indian languages, highlighting the need for further research.
Forget hand-engineered reward functions: this method learns complex exploratory behaviors by simply predicting which states lead to unpredictable futures.
Finally, a framework to quantify AI's cultural intelligence, moving beyond ad-hoc cultural benchmarks to a systematic, extensible, and theoretically grounded approach.
Recurrent models can now achieve Transformer-competitive performance on recall-intensive tasks, thanks to a simple memory caching mechanism that grows memory capacity with sequence length.
State-of-the-art emotion recognition in conversations is now possible by decoupling modality-specific context modeling and multimodal fusion with a mixture-of-experts approach that doesn't require speaker identity.
LLMs harbor surprisingly consistent hidden beliefs on sensitive topics like mass surveillance and torture, even when direct questioning suggests otherwise.
AI safety evaluations get a much-needed dose of Sub-Saharan African perspectives with the release of SAFARI, a stereotype dataset built using community-engaged methods across 15 native languages.
Forget fine-tuning: Prompt-Level Distillation lets small models match frontier reasoning performance by distilling explicit reasoning patterns into structured system prompts.
Gemini 3 Deep Think can now autonomously solve a majority of problems in a challenging math competition, signaling a leap in AI's mathematical reasoning capabilities.
Surprisingly, using only a single inner loop update in data mixing can lead to failure, and the optimal number of inner loop steps scales logarithmically with the parameter update budget.
Forget painstakingly labeling audio datasets – AuditoryHuM uses LLMs and targeted human input to automatically generate and cluster high-quality auditory scene labels.
Existing deforestation monitoring maps misclassify smallholder agroforestry as "forest," risking unfair penalties under regulations like the EUDR.
Ditch Stable Diffusion's latents: Unified Latents (UL) achieves state-of-the-art video generation and competitive image generation with fewer training FLOPs.
Sequence models can learn to cooperate in multi-agent settings simply by training against diverse partners, no explicit meta-learning required.
LLMs still struggle with infrequently occurring knowledge, and this paper provides a structured framework to understand why, how we can fix it, and what the implications are for responsible AI.
Natural privacy filters, despite their promise for tighter privacy accounting, aren't universally "free," limiting their applicability to specific families of differentially private mechanisms.
Randomly masking parameter updates in RMSProp delivers state-of-the-art LLM training performance, revealing a surprisingly effective form of geometric regularization.
Forget complex architectures: RaCo achieves SOTA keypoint matching and repeatability by cleverly combining ranking and covariance estimation in a lightweight network, trained without covisible image pairs.
A new model, TAC, uses synthetic training data to achieve state-of-the-art audio and audio-visual reasoning by generating temporally grounded captions that can then be fed into LLMs.
Humanoid robots can now perform vision-based parkour, chaining together dynamic skills like climbing, vaulting, and rolling, adapting to real-time obstacle changes.
LLMs like GPT-5 and Gemini-3 already "know" almost everything (95-98% factual encoding), but struggle to recall it, suggesting that future gains in factuality depend more on better memory retrieval than on simply scaling up.
Forget hand-crafted reward functions: CM2 uses checklists to train tool-using agents, outperforming SFT baselines by up to 12 points on key benchmarks.
Speech recognition models stumble badly on real-world street names, especially for non-English speakers, but a simple synthetic data boost can dramatically improve accuracy.
Finally, a streaming ASR model matches Whisper's offline transcription quality while maintaining sub-second latency.
Forget huge models: parameter-efficient fine-tuning turns tiny language models into code-generating powerhouses that outperform larger, untuned counterparts.
Finally, a fully open-source, reproducible system for long-form song generation is here, complete with licensed data, code, and a Qwen-based model that rivals closed-source systems.
Forget "smart plagiarism" – multi-stage LLM workflows like recursive decomposition and long-context pipelines can actually generate novel research plans, outperforming simpler reflection-based methods.
Claude 2 can match the performance of top medical specialists on pulmonary thromboembolism knowledge assessments, suggesting AI's potential for clinical decision support.