Search papers, labs, and topics across Lattice.
1
0
3
Exact sampling in large-vocabulary decoding can be sped up by 19% simply by fusing it into the LM-head matmul, turning a bandwidth bottleneck into a lightweight epilogue.