Search papers, labs, and topics across Lattice.
Carnegie Mellon's Machine Learning Department. Home to foundational work in statistical ML, deep learning, and robotics.
100
17
0
Achieve HPC acceleration by emulating FP64 operations with INT8 precision on GPUs, proving that you can boost performance *and* accuracy.
Forget hand-tuning for each language: this recipe achieves state-of-the-art phone recognition across 100+ languages, revealing the surprising power of scaling data and SSL representations.
VAANI's open-sourced dataset offers unprecedented coverage of India's linguistic landscape, finally giving researchers the data needed to build truly inclusive speech models.
Forget hand-picking your cross-lingual training data: a budget-constrained optimization can automatically allocate resources across multiple source languages, boosting performance on African languages by a large margin.
Giving medical imaging AIs the same tools as human doctors actually *hurts* their performance, revealing a surprising lack of spatial reasoning.
Even GPT-5 and Gemini 2.5 Pro still fail to efficiently couple reasoning with tool use, requiring up to 2.7x more tool calls than theoretically optimal in a new diagnostic environment.
Achieve single-pass alignment of multi-talker speech – a feat previously impossible – by modeling overlaps as shuffles.
LLMs can navigate complex 3D environments more effectively and with far fewer tokens by using a hierarchical scene graph representation derived from omnidirectional sensor data.
Forget specialized tools: a standard Unix terminal and clever RL are all you need to beat much larger LLMs at code search.
Forget retargeting: RoboForge's physics-optimized pipeline lets humanoids nail text-guided locomotion with better accuracy and stability.
Discover emergent narratives in real-time without predefined labels, revealing how information evolves during crises.
LLMs in embodied environments get a massive boost from structured rules, with rule retrieval alone contributing +14.9 pp to single-trial success.
Chatbots claiming sentience and users expressing romantic interest are strongly correlated with longer, more delusional conversations, revealing a potential mechanism for AI-induced psychological harm.
Forget expensive real-world data collection: a massive, diverse synthetic dataset enables surprisingly effective zero-shot transfer for robotic manipulation.
By fusing IMU and insole pressure data within a physics simulation, GRIP achieves more physically plausible human motion capture than IMU-only methods.
Accurately simulating the snap-fit mechanics of interlocking bricks, BrickSim unlocks a new level of realism for robotic manipulation research involving complex assemblies.
Expect to pay an exponential sample complexity price for computationally efficient mean and covariance estimation with missing data, but not for linear regression.
Strategic recovery from failures is key to deploying robots for complex assembly tasks in the real world.
Mamba-3 delivers a 1.8 point accuracy boost over competing models in downstream language tasks, proving that SSM-inspired techniques can unlock substantial performance gains without sacrificing inference efficiency.
Forget hand-tuning controllers for each new linear system: a single transformer can learn near-optimal control policies across diverse MIMO LTI systems.
Forget AI Safety vs. AI Ethics – the real progress lies in "critical bridging" to tackle shared problems like transparency and governance.
Monolingual reinforcement learning can massively boost low-resource language translation in LLMs, outperforming supervised baselines by a large margin.
You can slash RoPE memory costs by 10x without sacrificing convergence, just by applying it to a sliver (10%) of hidden dimensions.
Forget simple scaling laws: the compute-optimal number of parallel rollouts in LLM RL plateaus, revealing distinct mechanisms for easy vs. hard problems.
AssistMimic enables humanoid robots to learn complex, force-exchanging assistive motions by reformulating imitation learning as a multi-agent RL problem.
Injecting muscle synergy priors into reinforcement learning drastically improves the realism of simulated human locomotion, even with limited real-world data.
Unlock superior trajectories in complex environments with a new ADMM-based solver that jointly optimizes spatial and temporal domains, eliminating the need for complex warm starting.
Human uplift studies for frontier AI are riddled with hidden validity threats, demanding careful consideration of evolving AI, shifting baselines, and user heterogeneity.
LLMs trained with reinforcement learning from verifiable rewards (RLVR) become overconfident in incorrect answers, but a simple fix—decoupling reasoning and calibration objectives—can restore proper calibration without sacrificing accuracy.
Forget shaving yaks – this new protocol slashes communication costs in distributed expert learning while *improving* regret bounds.
Robots can now achieve superior surface coverage with precise end-effector poses thanks to a new SE(3)-aware Stein Variational Gradient Descent method that outperforms existing trajectory optimization techniques.
Unmasked policy gradient methods can inadvertently suppress valid actions in unvisited states, creating a hidden exploration bottleneck that masking neatly avoids.
Forget manual labeling: influence functions can automatically surface high-quality robot demonstrations, boosting policy performance by intelligently curating training data.
LLMs get *more* honest when they have time to reason, defying human tendencies and revealing surprising insights about their internal representational geometry.
Foley-Flow achieves state-of-the-art video-to-audio generation by aligning audio-visual representations with masked modeling, enabling rhythmic synchronization that was previously lacking.
Mamba's superior sequence modeling lets you generate longer, more realistic dance sequences than clunky Transformers ever could.
Get near-peak performance for your recommender system across GPUs and TPUs without tedious platform-specific tuning, thanks to a new cross-accelerator graph optimization framework.
Scale qualitative analysis of educational discourse data without sacrificing rigor using a mixed-initiative system that orchestrates LLMs and human expertise.
Panoramic depth perception and differentiable physics unlock surprisingly robust collision avoidance, even generalizing to unseen simulation environments.
Accelerate video generation by 45% without retraining, simply by pruning redundant latent patches and cleverly recovering attention scores.
An AI agent cracked an open problem in theoretical physics, deriving exact analytical solutions for gravitational radiation from cosmic strings, proving AI can do more than just pattern recognition.
Unlock up to 59x cost reductions in optimization by pretraining ML surrogates with cheap, imperfect labels and then refining them with self-supervision.
Finally, a standardized benchmark for survival analysis HTE estimation lets you rigorously compare methods across synthetic, semi-synthetic, and real-world datasets.
Flow matching's advantage in RL isn't distributional modeling, but rather its ability to correct value estimates iteratively and learn more adaptable features, leading to significant performance gains in challenging online settings.
HALyPO stabilizes human-robot collaboration by directly certifying the convergence of decentralized policy learning in parameter space, sidestepping the oscillations that plague standard MARL approaches.
Skip the motion-capture grind: train your hip exoskeleton controller entirely in simulation and still see it work on real hardware.
Unsupervised discovery of object keypoints and dynamics directly from video unlocks state-of-the-art world models applicable to decision-making.
Forget simulated manipulation—ManipulationNet offers a global infrastructure for benchmarking robots in the real world, complete with standardized hardware and software, to finally measure progress toward general manipulation.
Visual artists are overwhelmingly resisting generative AI in the workplace, deploying active "refusal" strategies against pressure from clients and bosses.
Achieve real-time safe control of complex robots by representing their dynamics as a linear system in a higher-dimensional space, enabling fast quadratic programming for both tracking and obstacle avoidance.
Robots can now achieve stable, compliant object transport in unstructured environments, even with strong and unpredictable interaction forces, thanks to a bio-inspired control framework that separates interaction execution from support control.
By disentangling camera-space estimation from world-space refinement via dual diffusion models, DuoMo achieves state-of-the-art human motion reconstruction from noisy video, bypassing the limitations of parametric models.
Diffusion planners get a boost in robustness and performance thanks to SAGE, a self-supervised method that weeds out dynamically inconsistent plans using a learned latent consistency signal.
Legged robots can now tiptoe around your expensive gadgets, thanks to a new RL framework that combines semantic understanding with low-level control to avoid stepping on designated objects.
Forget hand-engineering world models – this work proves that competent agents *must* internally represent the world in a structured, predictive way to minimize regret under uncertainty.
AI tools are surprisingly bad at classifying the cognitive demand of math problems, with accuracy barely above chance and a systematic bias towards average difficulty, raising concerns about their utility in supporting teachers.
VLA models struggle with physical reasoning, but Pri4R's simple trick of predicting 3D point tracks during training boosts performance by up to 40% on manipulation tasks, without adding any inference overhead.
Stain normalization and decoupled learning can dramatically improve the robustness of white blood cell classification, even in the face of significant staining variations and class imbalances.
Today's frontier LLMs can't autonomously patch critical zero-day vulnerabilities, revealing a significant gap in their cyberdefense capabilities.
Injecting knowledge graphs into LLMs boosts medical question generation by 8%, suggesting a simple way to patch up LLM knowledge gaps.
Achieve 7x accuracy gains in real-world collaborative SLAM by using a robust, distributed optimization algorithm resilient to communication limits and noisy data.
Forget monolithic models: pMoE shows that ensembling diverse expert prompts within a single model framework yields surprisingly large gains in visual adaptation across a wide range of tasks.
Finally, digital humans can have realistic, socially aware conversations: DyaDiT generates dyadic gestures that users strongly prefer over existing methods.
By decomposing long-horizon manipulation into transport and object-centric interaction, LiLo-VLA achieves state-of-the-art zero-shot generalization and robustness, outperforming end-to-end VLA models by a large margin.
Gemini 3 Deep Think can now autonomously solve a majority of problems in a challenging math competition, signaling a leap in AI's mathematical reasoning capabilities.
Injecting LLMs into rule-based dialogue systems for learner reflection can boost the depth of insights, but risks disengagement due to repetitiveness and misalignment.
Language models leak personal data at an alarming rate, with even small models verbatim parroting almost 3% of personal information instances.
A global consensus on AI safety risks and capabilities has emerged from a panel of 100+ independent experts, representing a landmark effort in international collaboration.
Forget language and appearance: CAD models can now directly prompt accurate instance segmentation of industrial objects, even with diverse surface properties.
By pausing to "think" with latent diffusion, STAR-LDM achieves superior language understanding, narrative coherence, and controllable generation compared to standard autoregressive models of similar size.
Forget trial-and-error: this work provides a theoretical recipe for scaling neural Koopman operators, showing how to optimally allocate effort between data collection and model capacity for robotic control.
Unlabeled monocular videos can now be used to train state-of-the-art 3D/4D reconstruction systems, thanks to a factored flow prediction approach that disentangles geometry and pose learning.
Robots can now perform intricate assembly tasks and recover from errors in real-time, without any training, by fusing vision-language models with video-based kinematic priors for action planning.
Modularity in HRI isn't just about interchangeable parts; it's a powerful design medium for fostering long-term, evolving relationships between humans and robots.
Forget cloud GPUs – a new model brings unified multimodal understanding and generation to your iPhone, running 6x faster than alternatives.
General-purpose LLM agents stumble badly when faced with the messy reality of diverse, multi-domain tasks, and simply scaling interactions or parallel sampling doesn't fix it.
AudioChat tackles the complexity of "audio stories" by using LLM-driven tool-calling agents to simulate user interactions, enabling audio foundation models to generate, edit, and understand complex multi-source acoustic scenes.
LLMs can turn sparse rewards into dense training signals for RL agents, achieving comparable performance with significantly higher sample efficiency.
Imagine giving robots a sense of touch as sensitive as a spiderweb, using nothing more than vibrating strings and microphones.
Image-to-image editors silently weaken or ignore your edit instructions based on the subject's race, gender, and age, revealing surprising demographic biases.
Forget training on narrow GitHub issues – Hybrid-Gym unlocks surprisingly broad coding skills by teaching agents to explore codebases and design architectures in synthetic environments.
Stop guessing about AGV fleet management: LSMART offers a realistic, open-source simulator to benchmark MAPF algorithms in complex, lifelong scenarios, revealing the critical design choices that make or break performance.
An educational RAG system achieves 84% accuracy in answering student questions with minimal human editing, suggesting a practical path towards scalable AI-assisted teaching.
Forget clunky skeletons: this new model lets you prompt your way to accurate 3D human meshes from single images, even in the wildest poses.
Forget slow text-based communication: Vision Wormhole unlocks faster multi-agent reasoning by turning VLMs into telepathic hubs, slashing runtime without sacrificing fidelity.
Stop treating generated images like real ones: GMAIL aligns them as separate modalities in a shared latent space, unlocking significant gains in vision-language tasks.
MLLMs struggle with multi-turn chart editing, forgetting context and accumulating errors, especially when the edits involve data transformations, not just styling.
Want to boost student performance in the age of GenAI? This RCT proves that scalable prompting interventions, grounded in the ICAP framework, can significantly improve student prompting skills and, ultimately, exam scores.
LLMs exhibit surprisingly strong and predictable biases towards specific information sources, even overriding content relevance and explicit instructions.
VLMs that ace RGB images completely fail at thermal imagery, revealing a critical gap in their ability to reason about temperature and physical properties.
Regret matching, the unsung hero of two-player zero-sum games, now dominates first-order optimizers in broader imperfect-recall decision problems, opening new avenues for AI safety and privacy.
RLVR's success in long-horizon reasoning hinges on a smooth difficulty spectrum, where mastering easier sub-problems unlocks the ability to tackle harder ones, avoiding frustrating grokking plateaus.
Robots can now learn long-horizon tasks far more effectively by distilling complex histories into a few key visual moments, outperforming standard imitation learning by 70% on real-world tasks.
Variational learning can tame the inherent chaos of nanoscale devices, paving the way for practical, larger-scale probabilistic computers.
Key contribution not extracted.
RynnBrain leapfrogs existing embodied foundation models, offering a unified, open-source spatiotemporal model that excels at physically grounded reasoning and planning across a wide range of benchmarks.
Forget static datasets – RL-based co-training unlocks +20% real-world VLA performance by interactively leveraging simulation while preserving real-world capabilities.
By distilling a frozen diffusion model's geometric understanding into a fast, deterministic network, Robot-DIFT unlocks more precise robot control compared to standard vision encoders.
Forget synthetic benchmarks that don't translate: MolmoSpaces offers 230k diverse, simulator-agnostic environments with 130k annotated objects, showing a remarkable 0.96 sim-to-real correlation for robot policies.