Search papers, labs, and topics across Lattice.
15 papers from CMU Machine Learning on Training Efficiency & Optimization
Scale multi-agent RL diversity metrics to hundreds of agents without sacrificing accuracy: Graph-SND offers a drop-in replacement for quadratic SND calculations, achieving near-identical results with order-of-magnitude speedups.
Implicit time integration on GPUs gets a 3x speed boost thanks to a novel algebraic coarsening method that avoids costly explicit remeshing.
Attention bottlenecks in long-context decoding? SANTA slashes memory bandwidth demands by stochastically sampling value vectors, achieving 1.5x speedups without sacrificing accuracy.
VLMs can be significantly boosted on embodied tasks by mid-training on a carefully curated subset of VLM data that is highly aligned with the VLA domain, rivaling the performance of much larger models.
Dramatically improve multimodal recommendation accuracy without any training by initializing user embeddings with item modality features and user cluster information.
Forget full retraining: intelligently selecting data subsets using gradient-based representations can keep your generative recommender fresh and robust to drift.
Training speech separation models on real-world noisy data doesn't have to mean accepting noisy outputs: this method cuts residual noise in half.
LLMs learn skills in a surprisingly consistent order during pretraining, revealing a hidden curriculum that's predictable across models and readable from their internal representations.
Get more from less: SonoSelect intelligently guides ultrasound probes to achieve comparable diagnostic accuracy with far fewer views, slashing scanning time and processing costs.
Forget hand-picking your cross-lingual training data: a budget-constrained optimization can automatically allocate resources across multiple source languages, boosting performance on African languages by a large margin.
Expect to pay an exponential sample complexity price for computationally efficient mean and covariance estimation with missing data, but not for linear regression.
You can slash RoPE memory costs by 10x without sacrificing convergence, just by applying it to a sliver (10%) of hidden dimensions.
Flow matching's advantage in RL isn't distributional modeling, but rather its ability to correct value estimates iteratively and learn more adaptable features, leading to significant performance gains in challenging online settings.
Robots can now learn long-horizon tasks far more effectively by distilling complex histories into a few key visual moments, outperforming standard imitation learning by 70% on real-world tasks.
Variational learning can tame the inherent chaos of nanoscale devices, paving the way for practical, larger-scale probabilistic computers.