Search papers, labs, and topics across Lattice.
King Abdullah University of Science and Technology (KAUST), Provable Responsible AI and Data Analytics (PRADA) Lab
1
0
2
3
LLMs can predict their *own* output length with surprising accuracy by simply analyzing their internal hidden states, enabling significant throughput gains via length-aware scheduling.