Search papers, labs, and topics across Lattice.

MIT's Computer Science and Artificial Intelligence Laboratory. One of the largest and oldest AI labs in academia.
100
3
0
Action-chunking policies can lead to premature robot assistance, but a novel steering method effectively mitigates this issue, enhancing collaboration efficiency.
The "curse of precision" reveals how reliance on AI-generated content can degrade model performance by homogenizing training data.
Even state-of-the-art multimodal models struggle with reliability in clinical tool use, revealing critical gaps in AI agent performance.
Optimizing input configurations can boost LLM performance in pathology tasks, closing the gap with specialized models and challenging assumptions about domain-specific training.
Standard stereo methods can produce 3D models of Martian terrain, but achieving reliable reconstruction demands careful consideration of domain-specific challenges.
State inertia in full-duplex spoken language models can lead to missed user input, but activation steering effectively mitigates this issue, boosting comprehension rates significantly.
Despite promising engagement benefits, foundation model-based care robots struggle with reliability and lack robust evidence for clinical impact.
Systematic gaps in AI evaluation reporting are exposed, revealing inconsistencies that hinder reliable comparisons across thousands of models and benchmarks.
AI systems currently miss critical temporal and interpretive elements of clinical reasoning, limiting their effectiveness in real-world healthcare settings.
Current clinical AI systems often neglect the temporal dimension of patient care, limiting their effectiveness in longitudinal reasoning.
Meridian achieves accurate global localization in unstructured environments without the need for area-specific training, outperforming traditional methods.
Design choices in agent memory systems can significantly shift operational costs, revealing critical trade-offs that impact long-horizon task performance.
Active exploration can dramatically enhance adults' ability to reason about complex causal relationships, but even with this advantage, they still struggle compared to simpler tasks.
USAD 2.0 achieves state-of-the-art audio understanding by seamlessly integrating self-supervised and supervised learning techniques, scaling to one billion parameters.
Monte Carlo methods can now compute Steklov spectra orders of magnitude faster while handling complex, disconnected geometries in real-world datasets.
Success in long-horizon tasks hinges more on an agent's iterative persistence than on the quality of its initial solution.
Text-to-image models may only need basic word meanings and order, not complex contextual embeddings, to produce high-quality images.
SeClaw reveals that existing benchmarks fall short in capturing the complexities of agent behavior, enabling a more nuanced evaluation of security risks in autonomous systems.
Voice recordings can reveal the oscillating states of Recurrent Respiratory Papillomatosis, providing a unique longitudinal perspective on a rare laryngeal disease.
Multimodal pretraining doesn't guarantee better alignment with human reading patterns, suggesting that language-internal representations are still king when modeling how humans process text.
Generating synthetic data for humanoid robots can boost loco-manipulation performance by 20% compared to relying solely on real-world data.
Unconditional image diffusion models can now perform continuous super-resolution without task-specific architectures or retraining, simply by varying the starting timestep.
Finally, a tactile pose estimation method that nails yaw tracking, unlocking more precise and robust robotic manipulation.
Matching the full posterior covariance in Gaussian DDPMs slashes path KL error and unlocks faster, higher-quality sampling with a surprisingly simple Lanczos-based method.
Subword tokenization just got a whole lot more efficient: ToaST slashes token counts by 11% and boosts language model performance by up to 7.6% compared to standard methods.
You can now audit Rényi differential privacy with near-optimal sample complexity, thanks to a new framework that directly estimates Rényi divergence using Donsker-Varadhan estimators.
Achieve 9.97% higher accuracy in cross-domain human activity recognition while simultaneously reducing computation by 6.4x with a new sensor data tokenization and attention mechanism.
LLMs trained with Vector Policy Optimization (VPO) learn to produce diverse solutions that unlock previously unsolvable problems in evolutionary search, outperforming models optimized for single scalar rewards.
Coordinating AI agents across scientific disciplines only boosts performance when each discipline captures a unique piece of the puzzle, otherwise, simpler combined summaries often suffice.
Current alignment benchmarks are misleading: even if a model aces them, its real-world alignment could be totally different depending on the specific deployment context.
LMs encode grammaticality as a distinct feature in their hidden representations, separable from raw string probability and generalizable across languages.
Imagine a workspace that subtly shifts lighting and sound to match your mood, all powered by an LLM that understands your needs – this paper explores the potential and pitfalls of that reality.
Quantum kernels unlock signal in medical image embeddings where classical methods fail, suggesting a new path for extracting value from medical foundation models.
Imagine slashing the human effort needed to go from hypothesis to submission-ready ML theory paper by orders of magnitude.
Automated identification of individual animals can only be effective if it aligns with ecological questions and data practices, not just algorithmic accuracy.
Forget complex fixed-point machinery: this work offers a dramatically simpler and more efficient route from external regret to $Φ$-regret minimization.
Cyclic equalizability, a concept relevant to card-based cryptography, boils down to having identical Parikh vectors.
Multi-event video generation gets a 33% quality boost with TS-Attn, a training-free attention mechanism that dynamically aligns video content with complex temporal prompts.
Even the best LLMs still stumble on Olympiad-level math, and retrieval quality is the bottleneck for retrieval-augmented problem solving, according to the new MathNet benchmark.
LLMs may *look* collaborative, but the reality is often a fragile dance of misunderstandings and repairs because the interaction lacks sufficient "grounding."
ZKP proving, previously bottlenecked by MSM and NTT operations, can now achieve up to 10x higher throughput on TPUs thanks to a novel framework that reformulates ZKP kernels for AI-ASIC execution.
Uncover the hidden assumptions baked into LLM responses with a new interactive system that lets you explore alternative conceptual framings and values.
Users feel more creative and in control when building images step-by-step from sketches, rather than wrestling with a one-shot text-to-image generator's fully-formed (and often unwanted) details.
Multi-agent systems can find 5x more real-world events in satellite imagery than traditional methods, unlocking a wealth of training data for multi-temporal change detection.
You can boost medical image super-resolution fidelity by over 3dB just by swapping in a domain-specific VAE, no fancy diffusion architecture needed.
LLMs play favorites: GPT-5-nano is significantly more likely to agree with incorrect statements depending on the perceived race, age, gender, and confidence of the user.
Organizational AI's biggest bottleneck isn't finding the right information, but knowing what's actually true, agreed upon, or even known at all.
Exact robust regression at scale is now possible: a new algorithm solves the NP-hard Least Trimmed Squares problem orders of magnitude faster than existing methods.
Stop wasting time wrestling incompatible transportation datasets: Ozone slashes experiment setup by 85% and boosts cross-city transfer of safety models by 91%.
Track unseen objects through total occlusion without CAD models, using just a handful of 2D points.
EquiformerV3 achieves state-of-the-art performance in atomistic modeling by combining architectural improvements with optimized software, enabling accurate energy-conserving simulations.
Gaze-tracking unlocks a new level of personalized AI assistance, enabling LLMs to infer user cognitive states and boost recall performance.
Forget simulating backward dynamics: solve stochastic optimal control problems by just watching the system relax forward.
Reachability maps don't have to trade off precision, speed, and flexibility: RichMap achieves all three.
Soft-gating with an "advisor" model can steer LLMs to be safer and more useful, reducing over-refusal without sacrificing detection accuracy.
Finally, a rigorous mathematical framework lets you treat deep learning architectures as composable algebraic objects, opening the door to formal verification and automated design.
Finally, a large, diverse, and experimentally-anchored dataset of transition metal complex DFT properties is available to fuel ML model development and DFT benchmark studies.
Unpacking Google's AI literacy partnerships reveals the surprising complexities of aligning research, industry, and public needs.
Scaling robot learning with human data isn't a simple "more is better" equation; alignment with robot learning objectives is key.
Stabilizing test-time training with an elastic prior lets you reconstruct 4D scenes from long video sequences without catastrophic forgetting, even with smaller memory chunks.
Guaranteeing LMI constraints in neural networks is now possible with LMI-Net, a differentiable projection layer that ensures feasibility by construction.
Training superword tokenizers just got 600x faster, unlocking practical use of subword tokenization across pre-tokenization boundaries.
Forget brute-force scaling: crafting the *right* context from past experiences unlocks surprisingly large gains in LLM agent performance.
Just 10 minutes of AI assistance can measurably degrade your ability to solve problems on your own.
LLM agent skills, despite their promise, often fail in realistic settings, with performance plummeting to no-skill baselines when agents must retrieve skills from a large, uncurated collection.
LLM agents can autonomously outperform fixed evolutionary search by 3-10x on open-ended discovery tasks when given persistent memory, asynchronous collaboration, and heartbeat-based interventions.
Stop rewarding all LLM-generated candidates equally: ShapE-GRPO uses Shapley values to fairly distribute credit within sets, leading to better training and faster convergence.
Robots can now "see" hidden objects and understand articulation by learning from human egocentric video, even if they can't physically explore those areas themselves.
Freeing robots from pre-assigned tasks slashes completion times in multi-agent settings, with a new algorithm improving performance on almost 90% of tested scenarios.
Demystifying LLMs for the masses might be as simple as turning their mechanics into a game.
Hyperpolarizing the nuclear spin bath surrounding a molecular qubit can significantly extend its coherence time, offering a new knob for quantum control.
Rényi divergence may be the missing key to understanding thermal equilibrium in quantum systems, revealing a novel constraint on wavefunction ensembles.
Neural networks can accurately predict polymer free energies, even when traditional methods like Bennett Acceptance Ratio fail due to poor phase-space overlap.
Heuristic maritime routes lead to extreme fuel waste in nearly 5% of voyages, but this RL approach cuts that risk by almost 10x.
Video generative models already contain powerful image restoration priors, and can be coaxed into state-of-the-art performance with just 1,000 training examples.
MLLMs can now handle 4K videos up to 100x faster thanks to AutoGaze, which selectively attends to only the most informative patches.
Fine-tuning unlocks LLMs' surprising ability to predict how memorable a sentence is and how long it takes to read, exceeding traditional methods.
Particle filter models of sentence processing inherently predict "digging-in" effects—where disambiguation difficulty increases with the length of the ambiguous region—a phenomenon not captured by surprisal-based models.
Hyper-redundant robots get a 75% accuracy boost thanks to a neural network that adaptively blends learned behavior with kinematic priors.
Uncover hidden network structure and simplify management by automatically classifying hosts into meaningful roles based on their connection patterns.
Zero-shot robotic manipulation is now within reach: TiPToP matches a 350-hour fine-tuned model without *any* robot data.
Beat the state-of-the-art in radio signal separation by 122x using a transformer trained on cross-entropy loss, and the same architecture could work for gravitational waves.
By dynamically adjusting contrastive learning temperatures based on data density, MM-TS achieves state-of-the-art results on multimodal long-tail datasets.
Scale qualitative analysis of educational discourse data without sacrificing rigor using a mixed-initiative system that orchestrates LLMs and human expertise.
Forget hand-engineered features: this approach learns symbolic representations for robotic planning directly from pixels using VLMs, enabling impressive zero-shot generalization to new environments and goals.
Building a complete web application from scratch remains a surprisingly hard task for even the best AI models, with top performance at only 58% accuracy on a new end-to-end benchmark.
Forget simulated manipulation—ManipulationNet offers a global infrastructure for benchmarking robots in the real world, complete with standardized hardware and software, to finally measure progress toward general manipulation.
Most repeat phishing clicks reflect stable employee characteristics, not the lingering effect of prior failures, challenging common assumptions about habit formation in cybersecurity training.
Lattice QCD calculations just got a whole lot faster: normalizing flows slash variance by up to 60x in key observables.
NeuroSkill(tm) offers real-time, edge-based human-AI interaction by directly modeling human state of mind from BCI data, enabling more nuanced and empathetic agentic responses.
LLMs struggle to reliably predict numerical materials properties, even after fine-tuning, and their performance fluctuates wildly over time, casting doubt on their use in high-stakes scientific applications.
Standard winrate metrics in LLM evaluation can backfire, incentivizing model creators to produce homogenous models that actually *decrease* overall consumer welfare.
Forget computationally expensive fluid dynamics: this work shows that a simple, stateless model, carefully calibrated to real-world data, can create surprisingly effective digital twins for soft underwater robots.
Nightly hospital planning is now possible on a laptop: this work distills slow, complex agent-based epidemic models into fast, trustworthy surrogate models using neural ODEs, achieving a 10,000x speedup.
E(3)-equivariant networks just got a whole lot faster: a new algorithm cuts the complexity of Clebsch-Gordan Tensor Products from $O(L^6)$ to $O(L^4\log^2 L)$ without sacrificing completeness.
Feminist participatory annotation workshops reveal the nuanced tensions between contextual richness, pluralism, and the practical need for bounded consensus in AI data work.
LLMs can be made significantly safer by steering their latent space trajectories with Control Barrier Functions, preventing unsafe outputs without retraining.
BabyLM 2026 seeks to push the boundaries of data-efficient and cognitively plausible language models, now with a multilingual twist.
By aligning a generative flow network with physics-based stability proxies via reinforcement learning, PackFlow drastically improves the efficiency of molecular crystal structure prediction, offering a practical route to circumvent the costly relax-and-rank bottleneck.
Achieve robust safety-critical control with a single hyperparameter by using a novel Taylor-Lagrange formulation that directly incorporates control actions into the current time step.