Lattice AI Research

Research focus

Architecture Design (Transformers, SSMs, MoE) (2)Inference & Quantization (2)Training Efficiency & Optimization (1)Distributed Systems & Hardware (1)

Frequent co-authors

Shaohan Huang (2)Yingbo Hao (2)Zewen Chi (2)Li Dong (2)

Papers (2)

Mar 5, 2026

Microsoft ResearchMar 5, 2026·also BIT, PKU, Qinzheng Sun1

Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity

1.58-bit LLMs are surprisingly more resilient to sparsity than their full-precision counterparts, opening new avenues for extreme compression.

Di Zhang, Xun Wu, Shaohan Huang +9

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Microsoft ResearchMar 5, 2026·also BIT, Qinzheng Sun1

SlideSparse: Fast and Flexible (2N-2):2N Structured Sparsity

Unlock 33% faster LLM inference on commodity GPUs with SlideSparse, which finally brings hardware-accelerated (2N-2):2N sparsity to the masses, bridging the accuracy gap left by NVIDIA's strict 2:4 pruning.

Hanyong Shao, Yingbo Hao, Ting Song +9

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Search

Hanyong Shao

Research focus

Frequent co-authors

Papers (2)