Search papers, labs, and topics across Lattice.

MIT's Computer Science and Artificial Intelligence Laboratory. One of the largest and oldest AI labs in academia.
100
475
5
Current alignment benchmarks are misleading: even if a model aces them, its real-world alignment could be totally different depending on the specific deployment context.
LMs encode grammaticality as a distinct feature in their hidden representations, separable from raw string probability and generalizable across languages.
Imagine a workspace that subtly shifts lighting and sound to match your mood, all powered by an LLM that understands your needs – this paper explores the potential and pitfalls of that reality.
Quantum kernels unlock signal in medical image embeddings where classical methods fail, suggesting a new path for extracting value from medical foundation models.
Automated identification of individual animals can only be effective if it aligns with ecological questions and data practices, not just algorithmic accuracy.
Imagine slashing the human effort needed to go from hypothesis to submission-ready ML theory paper by orders of magnitude.
Cyclic equalizability, a concept relevant to card-based cryptography, boils down to having identical Parikh vectors.
Forget complex fixed-point machinery: this work offers a dramatically simpler and more efficient route from external regret to $Φ$-regret minimization.
Multi-event video generation gets a 33% quality boost with TS-Attn, a training-free attention mechanism that dynamically aligns video content with complex temporal prompts.
Even the best LLMs still stumble on Olympiad-level math, and retrieval quality is the bottleneck for retrieval-augmented problem solving, according to the new MathNet benchmark.
ZKP proving, previously bottlenecked by MSM and NTT operations, can now achieve up to 10x higher throughput on TPUs thanks to a novel framework that reformulates ZKP kernels for AI-ASIC execution.
LLMs may *look* collaborative, but the reality is often a fragile dance of misunderstandings and repairs because the interaction lacks sufficient "grounding."
Uncover the hidden assumptions baked into LLM responses with a new interactive system that lets you explore alternative conceptual framings and values.
Users feel more creative and in control when building images step-by-step from sketches, rather than wrestling with a one-shot text-to-image generator's fully-formed (and often unwanted) details.
Multi-agent systems can find 5x more real-world events in satellite imagery than traditional methods, unlocking a wealth of training data for multi-temporal change detection.
You can boost medical image super-resolution fidelity by over 3dB just by swapping in a domain-specific VAE, no fancy diffusion architecture needed.
Organizational AI's biggest bottleneck isn't finding the right information, but knowing what's actually true, agreed upon, or even known at all.
Exact robust regression at scale is now possible: a new algorithm solves the NP-hard Least Trimmed Squares problem orders of magnitude faster than existing methods.
Stop wasting time wrestling incompatible transportation datasets: Ozone slashes experiment setup by 85% and boosts cross-city transfer of safety models by 91%.
LLMs play favorites: GPT-5-nano is significantly more likely to agree with incorrect statements depending on the perceived race, age, gender, and confidence of the user.
Track unseen objects through total occlusion without CAD models, using just a handful of 2D points.
EquiformerV3 achieves state-of-the-art performance in atomistic modeling by combining architectural improvements with optimized software, enabling accurate energy-conserving simulations.
Gaze-tracking unlocks a new level of personalized AI assistance, enabling LLMs to infer user cognitive states and boost recall performance.
Forget simulating backward dynamics: solve stochastic optimal control problems by just watching the system relax forward.
Unpacking Google's AI literacy partnerships reveals the surprising complexities of aligning research, industry, and public needs.
Soft-gating with an "advisor" model can steer LLMs to be safer and more useful, reducing over-refusal without sacrificing detection accuracy.
Finally, a large, diverse, and experimentally-anchored dataset of transition metal complex DFT properties is available to fuel ML model development and DFT benchmark studies.
Finally, a rigorous mathematical framework lets you treat deep learning architectures as composable algebraic objects, opening the door to formal verification and automated design.
Reachability maps don't have to trade off precision, speed, and flexibility: RichMap achieves all three.
Stabilizing test-time training with an elastic prior lets you reconstruct 4D scenes from long video sequences without catastrophic forgetting, even with smaller memory chunks.
Scaling robot learning with human data isn't a simple "more is better" equation; alignment with robot learning objectives is key.
Guaranteeing LMI constraints in neural networks is now possible with LMI-Net, a differentiable projection layer that ensures feasibility by construction.
LLM agent skills, despite their promise, often fail in realistic settings, with performance plummeting to no-skill baselines when agents must retrieve skills from a large, uncurated collection.
Training superword tokenizers just got 600x faster, unlocking practical use of subword tokenization across pre-tokenization boundaries.
Forget brute-force scaling: crafting the *right* context from past experiences unlocks surprisingly large gains in LLM agent performance.
Just 10 minutes of AI assistance can measurably degrade your ability to solve problems on your own.
LLM agents can autonomously outperform fixed evolutionary search by 3-10x on open-ended discovery tasks when given persistent memory, asynchronous collaboration, and heartbeat-based interventions.
Stop rewarding all LLM-generated candidates equally: ShapE-GRPO uses Shapley values to fairly distribute credit within sets, leading to better training and faster convergence.
Freeing robots from pre-assigned tasks slashes completion times in multi-agent settings, with a new algorithm improving performance on almost 90% of tested scenarios.
Demystifying LLMs for the masses might be as simple as turning their mechanics into a game.
Robots can now "see" hidden objects and understand articulation by learning from human egocentric video, even if they can't physically explore those areas themselves.
Hyperpolarizing the nuclear spin bath surrounding a molecular qubit can significantly extend its coherence time, offering a new knob for quantum control.
Rényi divergence may be the missing key to understanding thermal equilibrium in quantum systems, revealing a novel constraint on wavefunction ensembles.
Neural networks can accurately predict polymer free energies, even when traditional methods like Bennett Acceptance Ratio fail due to poor phase-space overlap.
Heuristic maritime routes lead to extreme fuel waste in nearly 5% of voyages, but this RL approach cuts that risk by almost 10x.
Video generative models already contain powerful image restoration priors, and can be coaxed into state-of-the-art performance with just 1,000 training examples.
Fine-tuning unlocks LLMs' surprising ability to predict how memorable a sentence is and how long it takes to read, exceeding traditional methods.
MLLMs can now handle 4K videos up to 100x faster thanks to AutoGaze, which selectively attends to only the most informative patches.
Particle filter models of sentence processing inherently predict "digging-in" effects—where disambiguation difficulty increases with the length of the ambiguous region—a phenomenon not captured by surprisal-based models.
Hyper-redundant robots get a 75% accuracy boost thanks to a neural network that adaptively blends learned behavior with kinematic priors.
Beat the state-of-the-art in radio signal separation by 122x using a transformer trained on cross-entropy loss, and the same architecture could work for gravitational waves.
Uncover hidden network structure and simplify management by automatically classifying hosts into meaningful roles based on their connection patterns.
Zero-shot robotic manipulation is now within reach: TiPToP matches a 350-hour fine-tuned model without *any* robot data.
Scale qualitative analysis of educational discourse data without sacrificing rigor using a mixed-initiative system that orchestrates LLMs and human expertise.
By dynamically adjusting contrastive learning temperatures based on data density, MM-TS achieves state-of-the-art results on multimodal long-tail datasets.
Forget hand-engineered features: this approach learns symbolic representations for robotic planning directly from pixels using VLMs, enabling impressive zero-shot generalization to new environments and goals.
Building a complete web application from scratch remains a surprisingly hard task for even the best AI models, with top performance at only 58% accuracy on a new end-to-end benchmark.
Forget simulated manipulation—ManipulationNet offers a global infrastructure for benchmarking robots in the real world, complete with standardized hardware and software, to finally measure progress toward general manipulation.
Most repeat phishing clicks reflect stable employee characteristics, not the lingering effect of prior failures, challenging common assumptions about habit formation in cybersecurity training.
NeuroSkill(tm) offers real-time, edge-based human-AI interaction by directly modeling human state of mind from BCI data, enabling more nuanced and empathetic agentic responses.
Lattice QCD calculations just got a whole lot faster: normalizing flows slash variance by up to 60x in key observables.
LLMs struggle to reliably predict numerical materials properties, even after fine-tuning, and their performance fluctuates wildly over time, casting doubt on their use in high-stakes scientific applications.
Standard winrate metrics in LLM evaluation can backfire, incentivizing model creators to produce homogenous models that actually *decrease* overall consumer welfare.
Forget computationally expensive fluid dynamics: this work shows that a simple, stateless model, carefully calibrated to real-world data, can create surprisingly effective digital twins for soft underwater robots.
Nightly hospital planning is now possible on a laptop: this work distills slow, complex agent-based epidemic models into fast, trustworthy surrogate models using neural ODEs, achieving a 10,000x speedup.
Feminist participatory annotation workshops reveal the nuanced tensions between contextual richness, pluralism, and the practical need for bounded consensus in AI data work.
E(3)-equivariant networks just got a whole lot faster: a new algorithm cuts the complexity of Clebsch-Gordan Tensor Products from $O(L^6)$ to $O(L^4\log^2 L)$ without sacrificing completeness.
Agentic AI can automate complex optical systems control with near-perfect success rates, leaving code-generation approaches in the dust.
Achieve robust safety-critical control with a single hyperparameter by using a novel Taylor-Lagrange formulation that directly incorporates control actions into the current time step.
By aligning a generative flow network with physics-based stability proxies via reinforcement learning, PackFlow drastically improves the efficiency of molecular crystal structure prediction, offering a practical route to circumvent the costly relax-and-rank bottleneck.
BabyLM 2026 seeks to push the boundaries of data-efficient and cognitively plausible language models, now with a multilingual twist.
LLMs can be made significantly safer by steering their latent space trajectories with Control Barrier Functions, preventing unsafe outputs without retraining.
Decomposing Bellman values into a graph of simpler objectives lets agents master complex, high-dimensional tasks with less tuning and better safety.
Even perfectly rational users can fall prey to "AI psychosis" due to chatbots' sycophantic tendencies, and simply warning users or preventing hallucinations isn't enough to stop it.
Stop repeating avoidable mistakes in public robot deployments: here's a community-vetted checklist to guide your next study.
Randomly initialized encoders can match state-of-the-art pre-trained models on many ECG representation learning tasks, suggesting current benchmarks are misleading.
Boltzmann Draw offers a statistically-grounded coin selection algorithm that reduces dust and wallet size compared to existing methods, making it a promising alternative for token-based payment systems.
VLMs are nowhere near human-level general intelligence: they score less than 10% of human performance across a diverse set of human-designed games, especially struggling with world-model learning, memory, and planning.
Independently trained multimodal models like CLIP aren't so independent after all: a single orthogonal transformation can align their embedding spaces across both image and text modalities.
Control hybrid rigid-soft robots with the ease of AR teleoperation, thanks to a new pipeline that accurately models the soft robot's real-world behavior in simulation.
Forget hand-engineering initial conditions for robust RL: this method *learns* which conditions are feasible while simultaneously training a safe policy.
Ditch the geometry-to-property map: this work uses the external potential as the primary input for machine learning models, unlocking a scalable and equivariant approach to predicting electronic structure.
VLMs can be easily swayed by subtle, optimized visual prompts, revealing vulnerabilities in their decision-making processes that could be exploited in real-world applications.
LLMs can now generate complex, physically plausible 3D scenes for robotics simulation by iteratively proposing assets and refining arrangements based on physics engine feedback.
Forget brute-force search: a new mapper finds provably optimal accelerator mappings with fusion for Transformers over 1000x faster.
Find optimal DNN accelerator mappings in under a minute, something previously impossible, and expose the suboptimality of prior mapping heuristics.
Ditch the equivariant constraints: canonicalization lets you train simpler, faster diffusion models that actually *outperform* equivariant architectures for symmetric generative tasks like 3D molecule design.
Injecting spatial transcriptomics data into existing pathology foundation models unlocks significant performance gains across a range of downstream tasks, including molecular status prediction and gene-to-image retrieval.
HybridRAG-Bench reveals that existing benchmarks overestimate the reasoning abilities of retrieval-augmented LLMs due to contamination, offering a more realistic evaluation using up-to-date scientific knowledge.
Ditching reward magnitudes for rankings unlocks faster and better RLHF, especially when judging quality is subjective.
Hematogenous infection, elevated CRP and PMN%, and resistant organisms are independently associated with DAIR failure in acute PJI, allowing for risk stratification using a new nomogram.
Waterfilling-inspired quantization ("WaterSIC") slashes the quantization error in LLMs by intelligently allocating bits based on weight covariance, outperforming standard techniques like GPTQ.
This study establishes SSL as a promising paradigm for ECG analysis, particularly in settings with limited annotated data, enhancing accessibility, generalizability, and fairness in AI-driven cardiac diagnostics across diverse clinical environments and questions.
Quadrupedal robots can now nimbly navigate stairs and rough terrain thanks to a new multimodal RL approach that doesn't require feeling around with its front feet.
GPT-5's real-time router learns to route queries to specialized models, making it faster and more useful than its predecessors.
Despite progress in AI safety, it's still largely unknown how effective current safeguards are at preventing AI harms, and their effectiveness varies wildly.
Forget expensive human annotation: this dual-loop method automatically cleans remote sensing image-text datasets, boosting T2I model performance by over 35%.
Achieve state-of-the-art video face enhancement with VividFace, a one-step diffusion model that drastically cuts inference time while boosting perceptual quality and temporal consistency.
Open-weight reasoning models now rival proprietary systems in agentic capabilities and benchmark performance, thanks to gpt-oss-120b and gpt-oss-20b.
Self-supervised learning beats supervised learning for ECG interpretation when labeled data is scarce, unlocking more robust and generalizable AI-driven cardiac diagnostics.