Search papers, labs, and topics across Lattice.
19 papers from MIT CSAIL on Training Efficiency & Optimization
Demystifying LLMs for the masses might be as simple as turning their mechanics into a game.
Neural networks can accurately predict polymer free energies, even when traditional methods like Bennett Acceptance Ratio fail due to poor phase-space overlap.
Forget hand-engineered features: LLMs can automatically generate rubrics that transform raw text into powerful representations, outperforming even pre-trained clinical models on EHR tasks.
Forget fine-tuning: task-specific experts are already hiding in the neighborhood of pretrained weights, and you can find them with random sampling.
Beat the state-of-the-art in radio signal separation by 122x using a transformer trained on cross-entropy loss, and the same architecture could work for gravitational waves.
Forget more data: pre-training on just 164M tokens of synthetic data from Neural Cellular Automata can outperform pre-training on 1.6B tokens of natural language for downstream LLM tasks.
By dynamically adjusting contrastive learning temperatures based on data density, MM-TS achieves state-of-the-art results on multimodal long-tail datasets.
GPTQ's quantization of LLMs is leaving performance on the table: WaterSIC closes the gap with an information-theoretically near-optimal approach that beats the state-of-the-art on Llama and Qwen.
Achieve state-of-the-art results in high-resolution video geometry estimation by disentangling global coherence and fine detail using a dual-stream transformer architecture.
Lattice QCD calculations just got a whole lot faster: normalizing flows slash variance by up to 60x in key observables.
Nightly hospital planning is now possible on a laptop: this work distills slow, complex agent-based epidemic models into fast, trustworthy surrogate models using neural ODEs, achieving a 10,000x speedup.
Forget gradient projections – NESS sidesteps catastrophic forgetting by directly exploiting the null space of previous tasks, identified via small singular values, to constrain weight updates.
E(3)-equivariant networks just got a whole lot faster: a new algorithm cuts the complexity of Clebsch-Gordan Tensor Products from $O(L^6)$ to $O(L^4\log^2 L)$ without sacrificing completeness.
Independently trained multimodal models like CLIP aren't so independent after all: a single orthogonal transformation can align their embedding spaces across both image and text modalities.
Neural routing solvers can now efficiently tackle hard constraints thanks to Construct-and-Refine (CaR), which slashes the refinement steps needed by 500x while boosting solution quality.
Ditch the geometry-to-property map: this work uses the external potential as the primary input for machine learning models, unlocking a scalable and equivariant approach to predicting electronic structure.
Find optimal DNN accelerator mappings in under a minute, something previously impossible, and expose the suboptimality of prior mapping heuristics.
Achieve state-of-the-art video face enhancement with VividFace, a one-step diffusion model that drastically cuts inference time while boosting perceptual quality and temporal consistency.
Self-supervised learning beats supervised learning for ECG interpretation when labeled data is scarce, unlocking more robust and generalizable AI-driven cardiac diagnostics.