Search papers, labs, and topics across Lattice.
The Hong Kong University of Science and Technology (Guangzhou)
1
1
3
2
Naive application of LLM inference optimizations can *hurt* the performance of smaller reasoning models, highlighting the need for RLLM-specific serving strategies.