Search papers, labs, and topics across Lattice.
UC Berkeley's AI research lab. Pioneering work in robotics, RL, NLP, and computer vision.
46
543
12
Helium rain in gas giants may be less frequent than we thought, thanks to new simulations that significantly lower the estimated hydrogen-helium demixing temperatures.
Training domain-specific coding LLMs with realistic environments and large-scale RL can yield substantial gains in practical software engineering tasks.
Running robotic manipulation workloads entirely onboard kills robot batteries, but offloading to the cloud tanks accuracy due to network latency, revealing a critical compute placement trade-off.
Current AI's hunger for curated data may be solved by a new architecture inspired by human cognition that flexibly switches between observation, active behavior, and meta-control.
Teaching robots to manipulate objects just got easier: OCRA learns directly from human demonstration videos by focusing on object interactions and incorporating tactile feedback.
Ditch the clunky controllers: this hand-shadowing pipeline lets you teleoperate a robot arm with just an RGB-D camera and some clever inverse kinematics.
Securing AI agents demands a new security paradigm, as their integration of LLMs with traditional systems introduces vulnerabilities beyond those of standard software.
Reading Activity Traces (RATs) reveal the hidden creative work lost when algorithms automate interpretation, offering a path to design AI that preserves human insight.
Path entropy, not just thermodynamics, dictates the stability of patterns in reaction-diffusion systems, offering a new lens for understanding nonequilibrium dynamics.
Current ML benchmarks may be ungameable in theory, as they can lack a stable equilibrium where developers are incentivized to improve true model quality rather than just leaderboard scores.
Most social media platforms govern AI-generated content by simply applying existing content moderation policies, leaving key issues like ownership and monetization largely unaddressed.
Forget tactile simulation: PTLD distills real-world tactile sensor data into a robust state estimator that supercharges sim-trained manipulation policies.
Forget simulated manipulation—ManipulationNet offers a global infrastructure for benchmarking robots in the real world, complete with standardized hardware and software, to finally measure progress toward general manipulation.
Models are substantially better at pairwise self-verification than independent scoring, unlocking a more efficient and accurate approach to test-time scaling for complex reasoning.
Robots can now remember what they've done and what they need to do next for 15 minutes straight, thanks to a new memory architecture that mixes video and text.
Multimodal web agents are surprisingly vulnerable to cross-modal attacks, but a novel adversarial training approach can double task completion efficiency while mitigating these risks.
Existing QA benchmarks are too easy for LLMs, so iAgentBench offers a more realistic challenge by requiring agents to synthesize information from multiple sources on high-traffic topics.
Advisor performance paradoxically suffers most when personal AI is used moderately, highlighting the complex strategic interactions introduced by personal AI assistants.
Human-written solutions can actually *hurt* model performance on math problems, highlighting a critical gap between strategy usage and executability that Selective Strategy Retrieval (SSR) effectively bridges.
Now you can audit black-box LLM APIs for cheating (model substitution, overbilling) with <1% overhead, using verifiable computation.
Stop struggling with ad-hoc codebases: dLLM offers a unified, open-source framework to reproduce, fine-tune, and build diffusion language models, even from BERT-style encoders.
Unlock autonomous driving with YouTube: a new label-free pretraining method learns driving representations directly from unposed in-the-wild videos, outperforming LiDAR baselines with only a single monocular camera.
Aggregating responses from multiple copies of the same model expands the range of achievable outputs in compound AI systems through three key mechanisms, offering a path to overcome individual model limitations.
Ditching explicit 3D geometry, RAYNOVA achieves SOTA multi-view video generation by modeling spatio-temporal relationships directly with a dual-causal autoregressive framework and Plücker-ray positional encoding.
Forget temperature scaling: JUCAL calibrates aleatoric and epistemic uncertainty in classifier ensembles, achieving SOTA results with significantly smaller ensembles and lower inference costs.
LLM-driven program evolution gets a smart upgrade: AdaEvolve dynamically allocates resources to promising solution candidates, leaving static schedules in the dust.
Robots can now navigate complex outdoor environments and find objects using natural language queries, even without prior maps or precise depth sensing.
Achieve 13-15% more efficient LLM watermark detection by using e-values for anytime-valid inference, enabling early stopping without sacrificing statistical guarantees.
LLMs can now autonomously design and build better-performing agents using OpenSage, an agent development kit that lets them self-generate agent topology, toolsets, and memory structures.
A functional-first CS curriculum, BJC Sparks, makes programming accessible to middle schoolers by emphasizing data flow and engaging projects over traditional iteration-based approaches.
An educational RAG system achieves 84% accuracy in answering student questions with minimal human editing, suggesting a practical path towards scalable AI-assisted teaching.
Humanoid robots can now perform vision-based parkour, chaining together dynamic skills like climbing, vaulting, and rolling, adapting to real-time obstacle changes.
Autonomous driving benchmarks get a reality check: ScenicRules exposes failures by combining prioritized, multi-objective rules with formally modeled, stochastic scenarios.
Forget clunky skeletons: this new model lets you prompt your way to accurate 3D human meshes from single images, even in the wildest poses.
Language models organize concepts like months and years into surprisingly clean geometric structures because of hidden symmetries in language statistics, even when those statistics are heavily perturbed.
LLMs can't reliably generate the very skills that boost their performance, and smaller models equipped with expert-crafted skills can rival larger, skill-less models.
Achieve >97.5% of full-data VIT performance with only 16% of the data using ScalSelect, a surprisingly effective and scalable training-free data selection method.
Denoising diffusion models can significantly outperform discriminative methods in learning-to-rank, suggesting a new path for improving information retrieval.
LLM alignment can be destabilized by iterative training loops using model-generated preferences, leading to oscillations or entropy collapse under certain conditions.
Prediction-powered inference can beat direct error correction when using LLMs as judges, offering a more statistically efficient way to debias evaluation scores.
GPT-5's real-time router learns to route queries to specialized models, making it faster and more useful than its predecessors.
Despite progress in AI safety, it's still largely unknown how effective current safeguards are at preventing AI harms, and their effectiveness varies wildly.
Escape stochastic robotic systems' safety limitations with EigenSafe, a spectral method that learns a safety filter from the dominant eigenpair of a dynamic programming operator.
LLMs evaluating job candidates exhibit significant bias against hedging language, docking candidates by 25.6% on average, even when the content is equivalent.
An LLM can analyze patient records like a clinician, predicting HIV care disengagement with clinically relevant justifications, potentially revolutionizing resource allocation and patient outcomes in sub-Saharan Africa.
An end-to-end learned robotic system can now clean your kitchen in a completely new house, thanks to a novel co-training approach on diverse data.