Furu Wei

By rethinking RLHF, MicroCoder-GRPO enables smaller code generation models to rival larger counterparts, achieving significant performance gains and revealing 34 training insights.

Zongqian Li, Shaohan Huang, Zewen Chi +5

Code Generation & Program Synthesis RLHF & Preference Learning Training Efficiency & Optimization

Microsoft ResearchMar 8, 2026·also Cambridge, Qinzheng Sun1

Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems

Forget massive datasets – targeted training on a smaller, carefully curated dataset of challenging competitive programming problems yields 3x faster gains in code generation performance.

Zongqian Li, Tengchao Lv, Shaohan Huang +7

Code Generation & Program Synthesis Data Curation & Synthetic Data RLHF & Preference Learning+1

Mar 5, 2026

Microsoft ResearchMar 5, 2026·also NUS, Qinzheng Sun1

UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark

Forget unimodal tasks—UniM throws down the gauntlet for truly unified multimodal AI, demanding models juggle any combination of text, image, audio, video, code, documents, and 3D inputs and outputs in a single, interleaved stream.

Yanling Li, Minghui Guo, Kaiwen Zhang +13

Eval Frameworks & Benchmarks Multimodal Models Natural Language Processing

Microsoft ResearchMar 5, 2026·also BIT, PKU, Qinzheng Sun1

Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity

1.58-bit LLMs are surprisingly more resilient to sparsity than their full-precision counterparts, opening new avenues for extreme compression.

Di Zhang, Xun Wu, Shaohan Huang +9

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Microsoft ResearchMar 5, 2026·also BIT, Qinzheng Sun1

SlideSparse: Fast and Flexible (2N-2):2N Structured Sparsity

Unlock 33% faster LLM inference on commodity GPUs with SlideSparse, which finally brings hardware-accelerated (2N-2):2N sparsity to the masses, bridging the accuracy gap left by NVIDIA's strict 2:4 pruning.

Hanyong Shao, Yingbo Hao, Ting Song +9

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Mar 2, 2026

Jiebin Zhang +7Mar 2, 2026·also Tsinghua AI, NUDT, Qinzheng Sun1

Learning to Draft: Adaptive Speculative Decoding with Reinforcement Learning

Speculative decoding gets a throughput boost of up to 4.32x by using reinforcement learning to dynamically balance drafting and verification.

Jiebin Zhang, Zhenghan Yu, Eugene J. Yu +5

Inference & Quantization RLHF & Preference Learning

Apr 16, 2025

Microsoft ResearchApr 16, 2025·also Qinzheng Sun1

BitNet b1.58 2B4T Technical Report

A 1-bit LLM can match the performance of full-precision models, promising huge gains in efficiency.