Search papers, labs, and topics across Lattice.
15 papers from Microsoft Research on Training Efficiency & Optimization
Ditch the task-specific verifier: energy-based fine-tuning (EBFT) lets you directly optimize sequence-level behavior in LMs, beating SFT and matching RLVR in downstream tasks.
Forget massive datasets – targeted training on a smaller, carefully curated dataset of challenging competitive programming problems yields 3x faster gains in code generation performance.
By rethinking RLHF, MicroCoder-GRPO enables smaller code generation models to rival larger counterparts, achieving significant performance gains and revealing 34 training insights.
A 4B parameter SLM can now rival frontier agent performance in complex tool-use environments, thanks to a novel reinforcement finetuning framework that teaches it how to strategically acquire context and execute actions.
Ditching latent critics in offline RL unlocks state-of-the-art performance by directly backpropagating action-space gradients through a differentiable flow-based policy, enabling robust latent policy steering with minimal tuning.
1.58-bit LLMs are surprisingly more resilient to sparsity than their full-precision counterparts, opening new avenues for extreme compression.
Diffusion models can now efficiently tackle rare event sampling in molecular dynamics, unlocking rapid calculation of folding free energies in minutes to hours on a GPU.
Forget full-cache rollouts: this parameter-efficient fine-tuning method lets large reasoning models maintain accuracy while slashing memory usage during RL training.
Multimodal models no longer have to choose between understanding and generation: R3's "generate-understand-regenerate" framework breaks the optimization dilemma.
Knowing VM lifetimes in advance doesn't always guarantee better placement, challenging common assumptions about clairvoyance in cloud resource optimization.
Language models can now internalize experiential knowledge and system prompts more effectively through on-policy context distillation, leading to better task accuracy and out-of-distribution generalization.
By explicitly detecting and escaping "Forbidden Zones" during training, AMD unlocks significant gains in sample fidelity and training robustness for few-step generative models like SDXL.
Train massive models on unproven hardware with confidence: SIGMA achieves 94% accelerator utilization and trains a 200B MoE with near-perfect stability on early-life AI accelerators.
Open-source biomolecular modeling just got a boost: RF3 closes the gap with AlphaFold3 in structure prediction, thanks to the new AtomWorks data framework.
A 1-bit LLM can match the performance of full-precision models, promising huge gains in efficiency.