Binxing Xu

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Inference & Quantization (3)Architecture Design (Transformers, SSMs, MoE) (2)Training Efficiency & Optimization (2)RLHF & Preference Learning (1)

Frequent co-authors

Hao Gu (2)Hao Wang (2)Jiacheng Liu (2)Lujun Li (2)

Papers (3)

May 25, 2026

Xintong Yang +83w ago·also Microsoft Research, Guangzhou City Polytechnic, HKUST

IndexMem: Learned KV-Cache Eviction with Latent Memory for Long-Context LLM Inference

LLMs can maintain long-context performance even with aggressive KV-cache eviction by learning to predict token importance and compressing evicted tokens into a latent memory.

Xintong Yang, Hao Gu, Binxing Xu +6

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization

Apr 9, 2026

Hao Gu +11Apr 9, 2026·also Wenxuan Zhang2 Fumin Shen1

QaRL: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training--Inference Mismatch

Quantizing rollouts in LLM RL pipelines introduces a training-inference gap that QaRL closes, leading to +5.5 performance on math problems.

Hao Gu, Hao Wang, Jiacheng Liu +9

Inference & Quantization RLHF & Preference Learning Training Efficiency & Optimization

Binxing Xu +10Apr 9, 2026

Bit-by-Bit: Progressive QAT Strategy with Outlier Channel Splitting for Stable Low-Bit LLMs

Achieve near-lossless 2-bit LLMs with a novel quantization-aware training scheme that progressively reduces precision and intelligently handles outlier channels.

Binxing Xu, Hao Gu, Lujun Li +8

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Search

Binxing Xu

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (3)