Search papers, labs, and topics across Lattice.
Google's broad research division. Key contributions include Transformer architecture, BERT, T5, and TensorFlow.
61
62
1
Despite the effort required, Android developers overwhelmingly support platform-level changes to combat fingerprinting, suggesting a path to enhanced user privacy through collaborative platform-developer initiatives.
MLLMs are riddled with shared vulnerabilities across modalities, meaning a single weakness can be exploited to jailbreak safety filters, hijack instructions, or even poison training data.
Safety fine-tuning might inadvertently be stripping LLMs of their ability to understand non-human minds and entertain spiritual beliefs, even while preserving Theory of Mind.
Achieve world-consistent video generation by directly optimizing geometry in the latent space of pre-trained video diffusion models, sidestepping costly RGB-space operations and architectural changes.
ChatGPT's geographic reasoning can be surprisingly brittle, with minor syntactic changes causing significant output variations and task composition revealing unexpected distributional shifts.
Refining generative models with discriminator guidance provably improves generalization, offering a theoretical justification for techniques like score-based diffusion.
MLLMs are surprisingly prone to hallucinating subtle details, especially when asked about the absence of specific attributes or relationships within an image.
Imagine an XR experience where you can selectively isolate and enhance individual sound sources in real-time, making chaotic audio environments crystal clear.
Dataset condensation, previously limited to neural networks, can now democratize access to clinical data by enabling privacy-preserving training of classical models like decision trees and Cox regression.
Reasoning unlocks factual knowledge in LLMs, but beware: hallucinated reasoning steps can poison the well.
Forget local semantic alignment: CAST unlocks temporally coherent video retrieval and generation by explicitly modeling visual state transitions.
LLM-powered diagnostic AI is ready for prime time: a real-world clinical trial shows it's safe, patients love it, and doctors find it useful.
Forget catastrophic forgetting: this function-preserving expansion method lets you fine-tune without sacrificing pre-trained knowledge, matching full fine-tuning performance at a fraction of the cost.
Most social media platforms govern AI-generated content by simply applying existing content moderation policies, leaving key issues like ownership and monetization largely unaddressed.
AI-generated videos can now respect physics, thanks to a framework that uses a physical simulator to guide diffusion models, resulting in more realistic and coherent motion.
Reasoning models are surprisingly bad at controlling their own thoughts: Claude Sonnet 4.5 can control its chain-of-thought only 2.7% of the time, raising questions about the reliability of CoT monitoring.
An AI agent cracked an open problem in theoretical physics, deriving exact analytical solutions for gravitational radiation from cosmic strings, proving AI can do more than just pattern recognition.
Forget quadratic scaling: ZipMap zips entire 3D scenes from hundreds of images into a compact state in a single pass, unlocking 20x faster reconstruction.
DARKFormer closes the performance gap with exact softmax attention in finetuning by learning a data-aligned kernel geometry for efficient random feature approximation, sidestepping the need for retraining or large feature budgets.
Robots can now remember what they've done and what they need to do next for 15 minutes straight, thanks to a new memory architecture that mixes video and text.
Datacenter networks are haunted by "ghosts"—topology knowledge failures due to link flaps that occur every 48 seconds at 2025 cluster scale—and existing mitigations are insufficient, but Open Atomic Ethernet offers a potential exorcism.
Multimodal web agents are surprisingly vulnerable to cross-modal attacks, but a novel adversarial training approach can double task completion efficiency while mitigating these risks.
Despite dedicated efforts from multiple teams, existing speech systems still fall significantly short of deployment readiness for understanding real-world medical conversations in Indian languages, highlighting the need for further research.
LLMs are becoming "epistemic agents" that shape our knowledge environment, so we need a new framework for evaluating and governing them based on trustworthiness, not just performance.
Forget hand-engineered reward functions: this method learns complex exploratory behaviors by simply predicting which states lead to unpredictable futures.
Finally, a framework to quantify AI's cultural intelligence, moving beyond ad-hoc cultural benchmarks to a systematic, extensible, and theoretically grounded approach.
Recurrent models can now achieve Transformer-competitive performance on recall-intensive tasks, thanks to a simple memory caching mechanism that grows memory capacity with sequence length.
State-of-the-art emotion recognition in conversations is now possible by decoupling modality-specific context modeling and multimodal fusion with a mixture-of-experts approach that doesn't require speaker identity.
AI safety evaluations get a much-needed dose of Sub-Saharan African perspectives with the release of SAFARI, a stereotype dataset built using community-engaged methods across 15 native languages.
LLMs harbor surprisingly consistent hidden beliefs on sensitive topics like mass surveillance and torture, even when direct questioning suggests otherwise.
Gemini 3 Deep Think can now autonomously solve a majority of problems in a challenging math competition, signaling a leap in AI's mathematical reasoning capabilities.
Forget fine-tuning: Prompt-Level Distillation lets small models match frontier reasoning performance by distilling explicit reasoning patterns into structured system prompts.
Surprisingly, using only a single inner loop update in data mixing can lead to failure, and the optimal number of inner loop steps scales logarithmically with the parameter update budget.
Forget painstakingly labeling audio datasets – AuditoryHuM uses LLMs and targeted human input to automatically generate and cluster high-quality auditory scene labels.
Ditch Stable Diffusion's latents: Unified Latents (UL) achieves state-of-the-art video generation and competitive image generation with fewer training FLOPs.
Existing deforestation monitoring maps misclassify smallholder agroforestry as "forest," risking unfair penalties under regulations like the EUDR.
LLMs still struggle with infrequently occurring knowledge, and this paper provides a structured framework to understand why, how we can fix it, and what the implications are for responsible AI.
Sequence models can learn to cooperate in multi-agent settings simply by training against diverse partners, no explicit meta-learning required.
A new model, TAC, uses synthetic training data to achieve state-of-the-art audio and audio-visual reasoning by generating temporally grounded captions that can then be fed into LLMs.
Humanoid robots can now perform vision-based parkour, chaining together dynamic skills like climbing, vaulting, and rolling, adapting to real-time obstacle changes.
Randomly masking parameter updates in RMSProp delivers state-of-the-art LLM training performance, revealing a surprisingly effective form of geometric regularization.
Natural privacy filters, despite their promise for tighter privacy accounting, aren't universally "free," limiting their applicability to specific families of differentially private mechanisms.
LLMs like GPT-5 and Gemini-3 already "know" almost everything (95-98% factual encoding), but struggle to recall it, suggesting that future gains in factuality depend more on better memory retrieval than on simply scaling up.
Coding agents are vulnerable to a new class of stealthy, automated prompt injection attacks via poisoned skills, achieving high success rates even in realistic software engineering tasks.
Forget hand-crafted reward functions: CM2 uses checklists to train tool-using agents, outperforming SFT baselines by up to 12 points on key benchmarks.
Speech recognition models stumble badly on real-world street names, especially for non-English speakers, but a simple synthetic data boost can dramatically improve accuracy.
Finally, a streaming ASR model matches Whisper's offline transcription quality while maintaining sub-second latency.
Forget huge models: parameter-efficient fine-tuning turns tiny language models into code-generating powerhouses that outperform larger, untuned counterparts.
Finally, a fully open-source, reproducible system for long-form song generation is here, complete with licensed data, code, and a Qwen-based model that rivals closed-source systems.
Forget "smart plagiarism" – multi-stage LLM workflows like recursive decomposition and long-context pipelines can actually generate novel research plans, outperforming simpler reflection-based methods.
Claude 2 can match the performance of top medical specialists on pulmonary thromboembolism knowledge assessments, suggesting AI's potential for clinical decision support.
LLM safety guardrails are far less robust than benchmarks suggest, with accuracy dropping by as much as 57% on novel adversarial attacks, and some even generating harmful content in a "helpful mode" jailbreak.
Despite their promise, even the best multimodal LLM (GPT-4o) achieves only 26% accuracy in grading knee osteoarthritis from radiographs, revealing a significant gap in clinical reliability.
Reasoning-based safety guardrails, once thought to be a strong defense against jailbreaks, crumble with just a few strategically placed tokens.
Clinicians using a new medical literature mining LLM, LEADS, achieved 0.81 recall vs. 0.78 without it, while saving 20.8% of their time.
Even the best LLMs fail more than 40% of the time when orchestrating multiple tools in realistic scenarios, revealing critical gaps in real-world agent capabilities.
DPO's success isn't just clever engineering—it's deeply rooted in human choice theory, unlocking a surprisingly flexible framework for preference optimization and justifying many DPO extensions.
Forget sparse autoencoders: semi-nonnegative matrix factorization directly dissects MLP activations into human-interpretable features that causally steer LLMs better.
Forget hand-annotated data: Magnet distills multi-turn tool-use skills into LLMs by automatically generating training trajectories that outperform even Gemini 1.5 Pro.
LLMs can generate plain language summaries of scientific research that are as good as human-written ones, but easier to read.
Clinicians using a medical literature-specific foundation model, LEADS, achieved 23-27% time savings and improved accuracy/recall compared to working alone.