Search papers, labs, and topics across Lattice.
NTU Singapore
4
0
6
Fine-grained management of speculative decoding phases can boost LLM serving throughput by over 50% and cut latency nearly in half.
Serverless functions can get a 37% density boost and significantly reduced overhead by offloading I/O to a shared backend, without sacrificing ecosystem compatibility.
Video codecs, typically seen as just compression tools, can actually unlock 3x faster and 87% more efficient video analytics by guiding vision-language model inference.
PromptTuner slashes SLO violations by up to 7.9x and costs by 4.5x in LLM prompt tuning, outperforming existing resource management systems.