T. Ruiz

Papers on Lattice

Total citations

Topics

h-index

Research focus

Architecture Design (Transformers, SSMs, MoE) (1)Distributed Systems & Hardware (1)Inference & Quantization (1)

Frequent co-authors

Xuyang Shen (1)Yiran Zhong (1)Mengdi Wang (1)

Papers (1)

Mar 16, 2026

T. Ruiz +3Mar 16, 2026

FlashSampling: Fast and Memory-Efficient Exact Sampling

Exact sampling in large-vocabulary decoding can be sped up by 19% simply by fusing it into the LM-head matmul, turning a bandwidth bottleneck into a lightweight epilogue.

T. Ruiz, Xuyang Shen, Yiran Zhong +1

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Search

T. Ruiz

Research focus

Frequent co-authors

Papers (1)