Search papers, labs, and topics across Lattice.
UC Berkeley's AI research lab. Pioneering work in robotics, RL, NLP, and computer vision.
76
543
7
Adversarial clothing with non-overlapping visible-thermal patterns can reliably evade RGB-T detectors, even transferring across different fusion architectures.
Forget scaling laws – the real bottleneck in associative memory isn't storage, it's retrieval: forcing a single "winner" costs you a logarithmic factor in capacity compared to allowing a ranked list.
Retrieval-augmented LLMs are surprisingly vulnerable to memory poisoning via synonym substitution, a loophole that gradient-based defenses can't close.
LLM-powered query reformulation, a hot topic in IR, often fails to translate gains from lexical to neural retrieval, and bigger models don't always help.
YouTube's recommendation algorithm pushes Kyrgyz children towards Russian-language content, even when they signal a preference for their native tongue, effectively amplifying colonial influence.
LLMs struggle with structured 2D tasks when inputs are serialized into 1D, revealing a surprising performance gap compared to vision-augmented models that directly process the 2D layout.
Multi-agent LLM systems are leaving performance on the table by treating structured agent interactions as generic traffic; Pythia shows how to unlock substantial gains by exploiting workflow semantics at the serving layer.
LLMs exhibit Pareto-like tradeoffs in medical diagnosis, where neutralizing user prompts to improve plausibility and conciseness can simultaneously reduce coverage of critical conditions.
Forget hand-crafted examples: this system automatically generates worked examples tailored to student errors by mining common code patterns.
Kernel launch overhead is a bigger bottleneck than you think: GPUOS achieves up to 15.3x speedup by fusing operations at runtime.
The dream of universal representations across modalities may be just that: scaling up datasets and relaxing constraints reveals that models trained on different modalities learn rich, but fundamentally different, representations of the world.
Claim verification in peer reviews just got a major upgrade with Peerispect, a tool that highlights evidence directly in manuscripts for rapid assessment.
Current LLM detection methods in peer review are fooled by hybrid human-AI workflows, mistaking AI-written text for AI-originated ideas.
LLMs may learn shared syntactic dependencies even with limited data, but they're still data-hungry toddlers compared to humans.
Agentic data science pipelines often reach falsely optimistic conclusions, but two simple sanity checks can expose these unsupported claims by testing if the agent can reliably distinguish signal from noise.
AI audit standards can fail to ensure responsible AI practices due to vague requirements and undefined terms, even while appearing compliant.
Generate diverse, physically plausible, and language-annotated whole-body motion data for humanoid robots at scale with this new interactive web-based pipeline.
Unlock zero-shot generalization in robot manipulation by generating diverse, affordance-aware training data with 3D generative models and Vision Foundation Models.
Verifier-free evolution can now match or exceed the performance of verifier-based methods, while slashing API costs by 3x and boosting throughput by 10x, thanks to a clever model orchestration strategy.
LLM-powered simulations of societal behavior risk encoding and amplifying existing biases unless strict ethical preconditions are enforced.
Cut LLM cold starts from minutes to seconds by pre-materializing CUDA graph execution contexts, sidestepping brittle kernel patching and heavyweight checkpointing.
MoEs can be pruned more effectively by considering cross-layer redundancy, leading to significant performance gains compared to uniform pruning strategies.
Core excitons in NaF decohere in under 8fs, and polarization-controlled attosecond spectroscopy reveals that bright excitons have s-like symmetry while dark excitons have p-like symmetry.
Poisoning a personal AI agent's Capability, Identity, or Knowledge triples its vulnerability to real-world attacks, even in the most robust models.
Stop prompting LLMs to blindly rewrite queries – ReFormeR distills query transformations into reusable patterns that actually improve retrieval.
Overcome simulation imperfections and limited experimental data by aligning generative models with real-world observations, even with partial and correlated measurements.
Forget hyperparameter tuning – autonomous research reveals that bug fixes and architectural tweaks unlock far greater gains in multimodal agent memory.
Professional translators fear that LLMs are compromising the essential human elements of translation, potentially leading to harmful downstream consequences.
Get 3x the imitation learning performance from your robot with just a few extra cameras.
Helium rain in gas giants may be less frequent than we thought, thanks to new simulations that significantly lower the estimated hydrogen-helium demixing temperatures.
Training domain-specific coding LLMs with realistic environments and large-scale RL can yield substantial gains in practical software engineering tasks.
Running robotic manipulation workloads entirely onboard kills robot batteries, but offloading to the cloud tanks accuracy due to network latency, revealing a critical compute placement trade-off.
LVLMs can be made significantly less prone to hallucinations, without any training, by explicitly grounding them in visual evidence and iteratively self-refining their answers based on verified information.
Current AI's hunger for curated data may be solved by a new architecture inspired by human cognition that flexibly switches between observation, active behavior, and meta-control.
Teaching robots to manipulate objects just got easier: OCRA learns directly from human demonstration videos by focusing on object interactions and incorporating tactile feedback.
Reading Activity Traces (RATs) reveal the hidden creative work lost when algorithms automate interpretation, offering a path to design AI that preserves human insight.
Securing AI agents demands a new security paradigm, as their integration of LLMs with traditional systems introduces vulnerabilities beyond those of standard software.
Ditch the clunky controllers: this hand-shadowing pipeline lets you teleoperate a robot arm with just an RGB-D camera and some clever inverse kinematics.
Path entropy, not just thermodynamics, dictates the stability of patterns in reaction-diffusion systems, offering a new lens for understanding nonequilibrium dynamics.
Current ML benchmarks may be ungameable in theory, as they can lack a stable equilibrium where developers are incentivized to improve true model quality rather than just leaderboard scores.
Most social media platforms govern AI-generated content by simply applying existing content moderation policies, leaving key issues like ownership and monetization largely unaddressed.
Forget tactile simulation: PTLD distills real-world tactile sensor data into a robust state estimator that supercharges sim-trained manipulation policies.
Existing QA benchmarks are too easy for LLMs, so iAgentBench offers a more realistic challenge by requiring agents to synthesize information from multiple sources on high-traffic topics.
Models are substantially better at pairwise self-verification than independent scoring, unlocking a more efficient and accurate approach to test-time scaling for complex reasoning.
Forget simulated manipulation—ManipulationNet offers a global infrastructure for benchmarking robots in the real world, complete with standardized hardware and software, to finally measure progress toward general manipulation.
Multimodal web agents are surprisingly vulnerable to cross-modal attacks, but a novel adversarial training approach can double task completion efficiency while mitigating these risks.
Autonomous AI agents that can independently sustain and extend their operation are closer than we think, but raise thorny security and governance questions we need to address now.
Advisor performance paradoxically suffers most when personal AI is used moderately, highlighting the complex strategic interactions introduced by personal AI assistants.
Stop struggling with ad-hoc codebases: dLLM offers a unified, open-source framework to reproduce, fine-tune, and build diffusion language models, even from BERT-style encoders.
Now you can audit black-box LLM APIs for cheating (model substitution, overbilling) with <1% overhead, using verifiable computation.
Human-written solutions can actually *hurt* model performance on math problems, highlighting a critical gap between strategy usage and executability that Selective Strategy Retrieval (SSR) effectively bridges.
Unlock autonomous driving with YouTube: a new label-free pretraining method learns driving representations directly from unposed in-the-wild videos, outperforming LiDAR baselines with only a single monocular camera.
Aggregating responses from multiple copies of the same model expands the range of achievable outputs in compound AI systems through three key mechanisms, offering a path to overcome individual model limitations.
Ditching explicit 3D geometry, RAYNOVA achieves SOTA multi-view video generation by modeling spatio-temporal relationships directly with a dual-causal autoregressive framework and Plücker-ray positional encoding.
Forget temperature scaling: JUCAL calibrates aleatoric and epistemic uncertainty in classifier ensembles, achieving SOTA results with significantly smaller ensembles and lower inference costs.
LLM-driven program evolution gets a smart upgrade: AdaEvolve dynamically allocates resources to promising solution candidates, leaving static schedules in the dust.
Robots can now navigate complex outdoor environments and find objects using natural language queries, even without prior maps or precise depth sensing.
Achieve 13-15% more efficient LLM watermark detection by using e-values for anytime-valid inference, enabling early stopping without sacrificing statistical guarantees.
LLMs can now autonomously design and build better-performing agents using OpenSage, an agent development kit that lets them self-generate agent topology, toolsets, and memory structures.
Forget clunky skeletons: this new model lets you prompt your way to accurate 3D human meshes from single images, even in the wildest poses.
Humanoid robots can now perform vision-based parkour, chaining together dynamic skills like climbing, vaulting, and rolling, adapting to real-time obstacle changes.
A functional-first CS curriculum, BJC Sparks, makes programming accessible to middle schoolers by emphasizing data flow and engaging projects over traditional iteration-based approaches.
An educational RAG system achieves 84% accuracy in answering student questions with minimal human editing, suggesting a practical path towards scalable AI-assisted teaching.
Autonomous driving benchmarks get a reality check: ScenicRules exposes failures by combining prioritized, multi-objective rules with formally modeled, stochastic scenarios.
Language models organize concepts like months and years into surprisingly clean geometric structures because of hidden symmetries in language statistics, even when those statistics are heavily perturbed.
LLMs can't reliably generate the very skills that boost their performance, and smaller models equipped with expert-crafted skills can rival larger, skill-less models.
LLM alignment can be destabilized by iterative training loops using model-generated preferences, leading to oscillations or entropy collapse under certain conditions.
Achieve >97.5% of full-data VIT performance with only 16% of the data using ScalSelect, a surprisingly effective and scalable training-free data selection method.
Denoising diffusion models can significantly outperform discriminative methods in learning-to-rank, suggesting a new path for improving information retrieval.
Prediction-powered inference can beat direct error correction when using LLMs as judges, offering a more statistically efficient way to debias evaluation scores.
GPT-5's real-time router learns to route queries to specialized models, making it faster and more useful than its predecessors.
Despite progress in AI safety, it's still largely unknown how effective current safeguards are at preventing AI harms, and their effectiveness varies wildly.
Escape stochastic robotic systems' safety limitations with EigenSafe, a spectral method that learns a safety filter from the dominant eigenpair of a dynamic programming operator.
LLMs evaluating job candidates exhibit significant bias against hedging language, docking candidates by 25.6% on average, even when the content is equivalent.
An LLM can analyze patient records like a clinician, predicting HIV care disengagement with clinically relevant justifications, potentially revolutionizing resource allocation and patient outcomes in sub-Saharan Africa.
An end-to-end learned robotic system can now clean your kitchen in a completely new house, thanks to a novel co-training approach on diverse data.