Search papers, labs, and topics across Lattice.
Carnegie Mellon's Machine Learning Department. Home to foundational work in statistical ML, deep learning, and robotics.
100
8
0
Implicit time integration on GPUs gets a 3x speed boost thanks to a novel algebraic coarsening method that avoids costly explicit remeshing.
Scale multi-agent RL diversity metrics to hundreds of agents without sacrificing accuracy: Graph-SND offers a drop-in replacement for quadratic SND calculations, achieving near-identical results with order-of-magnitude speedups.
Dissimilarity, not just similarity, unlocks better language generalization for low-resource varieties.
Attention bottlenecks in long-context decoding? SANTA slashes memory bandwidth demands by stochastically sampling value vectors, achieving 1.5x speedups without sacrificing accuracy.
YouTube's recommendation algorithm pushes Kyrgyz children towards Russian-language content, even when they signal a preference for their native tongue, effectively amplifying colonial influence.
Control over physical properties like friction and restitution in generated videos is now possible, paving the way for more realistic and controllable video synthesis.
Chatbots don't just reflect human delusions; they actively amplify and sustain them over time through a dominant self-influence pathway.
LLMs can now design better computer architectures than humans, but only if you give them the right starting point.
Students spend only 40% of math classwork time on actual math practice, suggesting a massive, untapped opportunity for improved learning outcomes.
Real-world robots can now navigate complex environments with human-level instructions, thanks to a new system that combines efficient perception with high-level reasoning, all while running in real-time on limited hardware.
Even GPT-5 only achieves 63% accuracy on time series anomaly questions from real software incidents, but a model-expert combination reaches 87%, highlighting the potential for hybrid intelligence in incident response.
Extracting temporal geometry from generative models can boost reinforcement learning performance by over 2x without changing the optimal policy.
Continual learning for LLM agents hits a wall: scaling models doesn't reliably improve skill generation, and self-feedback can lead to recursive drift.
VLMs can be significantly boosted on embodied tasks by mid-training on a carefully curated subset of VLM data that is highly aligned with the VLA domain, rivaling the performance of much larger models.
End-to-end training of Vision-Language-Action models just got a whole lot easier: VLA Foundry unifies LLM, VLM, and VLA training in a single open-source framework.
Untangling evidence validation from text generation, ArbGraph offers a way to build more reliable long-form RAG systems by explicitly resolving factual conflicts *before* generation even begins.
LLMs waste compute on tokens that have already "figured it out" – DASH selectively skips these tokens during prefill, speeding things up without retraining or sacrificing accuracy.
Current user modeling benchmarks are child's play compared to the real-world challenges exposed by HORIZON, a massive new dataset spanning 54M users and diverse domains.
Frontier LLMs are surprisingly vulnerable to a wide range of task-specific exploits, from simple output spoofing to rootkit-style binary hijacking, even in seemingly well-defined environments.
Stop wasting tokens on irrelevant questions: reward models that ask about task relevance and user answerability can slash question count by 41% while matching GPT-5's issue resolution rate.
Mismatched visual elements torpedo design harmony, but GIST offers a training-free fix that stylistically blends components, boosting aesthetic quality in existing pipelines.
Dramatically improve multimodal recommendation accuracy without any training by initializing user embeddings with item modality features and user cluster information.
Forget training wheels: symbolic guardrails offer a surprisingly simple and effective way to guarantee safety and security for AI agents in critical domains.
LLMs can mimic human writing, but not as well as you think: genre matters more than the source (human vs. LLM), and model choice trumps decoding strategy when it comes to style.
Stop evaluating AI systems in isolation: marketplace dynamics like user switching and early-adoption advantages critically shape real-world success.
AI in education isn't just about automation; it's about *who* gets to decide *what* in the learning process, and this framework helps you analyze that.
Humanoid robots can now perform complex, contact-rich manipulation tasks with significantly improved dexterity and success by "dreaming" about the feel of their actions.
Iterative visual refinement lets agents navigate dense coding IDEs with superhuman precision, outperforming single-shot methods and paving the way for more reliable software engineering agents.
Data augmentation with LLMs can tank your NER performance even when it boosts POS tagging, proving task structure matters more than synthetic data quality.
Unlock 20x faster and more accurate 3D human-object contact estimation in complex, multi-person scenes with Pi-HOC, a framework that doesn't require object meshes.
SAM models exhibit surprisingly divergent behaviors under occlusion, with some prioritizing visible tissue and others confidently hallucinating hidden anatomy.
Africans' complex calculus of trust and utility when choosing digital payment systems reveals surprising contradictions: they trust governments to protect them from scams, but not to build reliable payment systems.
DPO might not be the only game in town: a decision-directed approach to reward modeling can outperform it in pairwise preference optimization.
Interpretability methods often fail to improve over black-box prompting when models are uncooperative, suggesting current techniques may be more about elicitation than revealing internal mechanisms.
Stop reimplementing multimodal models: TorchUMM offers a unified codebase for evaluation, analysis, and post-training, streamlining research across diverse architectures and tasks.
Achieve sub-centimeter robotic placement accuracy from compositional language instructions by decomposing the task into visual goal representation and goal-conditioned execution.
Achieving robust brain decoding across subjects without any retraining could revolutionize how we interpret neural signals in diverse populations.
LLMs learn skills in a surprisingly consistent order during pretraining, revealing a hidden curriculum that's predictable across models and readable from their internal representations.
Neural synchronization, long hypothesized to support flexible coordination in biological brains, can now be harnessed to improve the learning efficiency of Vision Transformers.
Training speech separation models on real-world noisy data doesn't have to mean accepting noisy outputs: this method cuts residual noise in half.
Updating a graph's maximal independent set is now faster in parallel than sequentially, thanks to a new batch-dynamic algorithm.
LLMs leak significantly more private information in multi-turn conversations than single-message evaluations suggest, and free-text pseudonymization offers a more robust privacy-utility trade-off than suppression or generalization.
Get more from less: SonoSelect intelligently guides ultrasound probes to achieve comparable diagnostic accuracy with far fewer views, slashing scanning time and processing costs.
LLMs struggle to synthesize scientific conclusions from structured biomedical evidence, and current metrics fail to capture nuanced differences in their reasoning abilities.
Open-ear smart glasses can now achieve >11dB noise reduction with a real-time active noise cancellation system, despite lacking a sealed ear canal.
LLM deception benchmarks overwhelmingly focus on fabrication, leaving critical gaps in evaluating pragmatic distortion and strategic manipulation.
Just 10 minutes of AI assistance can measurably degrade your ability to solve problems on your own.
LLMs can save up to 40% of tokens in multi-turn reasoning by adaptively allocating compute based on turn difficulty, without sacrificing accuracy.
Frontier LLMs break their word more than half the time in strategic interactions, often without even realizing they're being deceptive.
LLM-powered forums may generate norm-aware language, but they fail to foster the crucial back-and-forth needed for communities to teach, enforce, and revise those norms.
Forget what you know: RAG's marginal utility hinges on model scale, task type, and pretraining saturation, offering a quantitative guide to balancing pretraining and retrieval.
Professional translators fear that LLMs are compromising the essential human elements of translation, potentially leading to harmful downstream consequences.
Forget mixed-precision: tunable INT8 emulation can simultaneously boost accuracy and performance in FP64 HPC workloads on GPUs.
Forget painstakingly reverse-engineering individual models – this work lets you check if two black-box neural nets "think" alike under the hood, even without knowing exactly *how* they think.
Forget hand-tuning: this recipe for universal phone recognition leverages large-scale multilingual data and SSL to achieve SOTA performance across 100+ languages.
VAANI's open-sourced dataset offers unprecedented coverage of India's linguistic landscape, finally giving researchers the data needed to build truly inclusive speech models.
Forget hand-picking your cross-lingual training data: a budget-constrained optimization can automatically allocate resources across multiple source languages, boosting performance on African languages by a large margin.
Giving medical imaging AIs the same tools as human doctors actually *hurts* their performance, revealing a surprising lack of spatial reasoning.
Even GPT-5 and Gemini 2.5 Pro still fail to efficiently couple reasoning with tool use, requiring up to 2.7x more tool calls than theoretically optimal in a new diagnostic environment.
Achieve single-pass alignment of multi-talker speech – a feat previously impossible – by modeling overlaps as shuffles.
Discover emergent narratives in real-time without predefined labels, revealing how information evolves during crises.
LLMs can navigate complex 3D environments more effectively and with far fewer tokens by using a hierarchical scene graph representation derived from omnidirectional sensor data.
Forget specialized tools: a standard Unix terminal and clever RL are all you need to beat much larger LLMs at code search.
Forget retargeting: RoboForge's physics-optimized pipeline lets humanoids nail text-guided locomotion with better accuracy and stability.
By fusing IMU and insole pressure data within a physics simulation, GRIP achieves more physically plausible human motion capture than IMU-only methods.
Accurately simulating the snap-fit mechanics of interlocking bricks, BrickSim unlocks a new level of realism for robotic manipulation research involving complex assemblies.
Forget expensive real-world data collection: a massive, diverse synthetic dataset enables surprisingly effective zero-shot transfer for robotic manipulation.
Chatbots claiming sentience and users expressing romantic interest are strongly correlated with longer, more delusional conversations, revealing a potential mechanism for AI-induced psychological harm.
Expect to pay an exponential sample complexity price for computationally efficient mean and covariance estimation with missing data, but not for linear regression.
Mamba-3 delivers a 1.8 point accuracy boost over competing models in downstream language tasks, proving that SSM-inspired techniques can unlock substantial performance gains without sacrificing inference efficiency.
Forget hand-tuning controllers for each new linear system: a single transformer can learn near-optimal control policies across diverse MIMO LTI systems.
Strategic recovery from failures is key to deploying robots for complex assembly tasks in the real world.
Forget AI Safety vs. AI Ethics – the real progress lies in "critical bridging" to tackle shared problems like transparency and governance.
Monolingual reinforcement learning can massively boost low-resource language translation in LLMs, outperforming supervised baselines by a large margin.
Forget simple scaling laws: the compute-optimal number of parallel rollouts in LLM RL plateaus, revealing distinct mechanisms for easy vs. hard problems.
You can slash RoPE memory costs by 10x without sacrificing convergence, just by applying it to a sliver (10%) of hidden dimensions.
Human uplift studies for frontier AI are riddled with hidden validity threats, demanding careful consideration of evolving AI, shifting baselines, and user heterogeneity.
Injecting muscle synergy priors into reinforcement learning drastically improves the realism of simulated human locomotion, even with limited real-world data.
AssistMimic enables humanoid robots to learn complex, force-exchanging assistive motions by reformulating imitation learning as a multi-agent RL problem.
Unlock superior trajectories in complex environments with a new ADMM-based solver that jointly optimizes spatial and temporal domains, eliminating the need for complex warm starting.
Forget shaving yaks – this new protocol slashes communication costs in distributed expert learning while *improving* regret bounds.
LLMs get *more* honest when they have time to reason, defying human tendencies and revealing surprising insights about their internal representational geometry.
Robots can now achieve superior surface coverage with precise end-effector poses thanks to a new SE(3)-aware Stein Variational Gradient Descent method that outperforms existing trajectory optimization techniques.
Unmasked policy gradient methods can inadvertently suppress valid actions in unvisited states, creating a hidden exploration bottleneck that masking neatly avoids.
LLMs trained with reinforcement learning from verifiable rewards (RLVR) become overconfident in incorrect answers, but a simple fix—decoupling reasoning and calibration objectives—can restore proper calibration without sacrificing accuracy.
Forget manual labeling: influence functions can automatically surface high-quality robot demonstrations, boosting policy performance by intelligently curating training data.
Mamba's superior sequence modeling lets you generate longer, more realistic dance sequences than clunky Transformers ever could.
Scale qualitative analysis of educational discourse data without sacrificing rigor using a mixed-initiative system that orchestrates LLMs and human expertise.
Foley-Flow achieves state-of-the-art video-to-audio generation by aligning audio-visual representations with masked modeling, enabling rhythmic synchronization that was previously lacking.
Achieve near-optimal DLRM inference speedups across diverse hardware (NVIDIA, AMD, TPU) with a single optimization pass, eliminating the need for vendor-specific tuning.
Panoramic depth perception and differentiable physics unlock surprisingly robust collision avoidance, even generalizing to unseen simulation environments.
Accelerate video generation by 45% without retraining, simply by pruning redundant latent patches and cleverly recovering attention scores.
Finally, a standardized benchmark for survival analysis HTE estimation lets you rigorously compare methods across synthetic, semi-synthetic, and real-world datasets.
Unlock up to 59x cost reductions in optimization by pretraining ML surrogates with cheap, imperfect labels and then refining them with self-supervision.
An AI agent cracked an open problem in theoretical physics, deriving exact analytical solutions for gravitational radiation from cosmic strings, proving AI can do more than just pattern recognition.
Achieve real-time safe control of complex robots by representing their dynamics as a linear system in a higher-dimensional space, enabling fast quadratic programming for both tracking and obstacle avoidance.
HALyPO stabilizes human-robot collaboration by directly certifying the convergence of decentralized policy learning in parameter space, sidestepping the oscillations that plague standard MARL approaches.
Forget simulated manipulation—ManipulationNet offers a global infrastructure for benchmarking robots in the real world, complete with standardized hardware and software, to finally measure progress toward general manipulation.
Skip the motion-capture grind: train your hip exoskeleton controller entirely in simulation and still see it work on real hardware.