Search papers, labs, and topics across Lattice.
Huazhong University of Science and Technology
4
0
7
Achieve state-of-the-art 3D scene understanding by dynamically adapting network parameters at test time, proving that input-aware adjustments can significantly boost performance with minimal overhead.
Forget static prompts: this method dynamically adjusts persona influence during decoding, boosting role-playing agent realism without costly fine-tuning.
Slash spoken dialogue system latency by up to 51% with a new architecture that lets the system "listen-while-thinking" and "speak-while-thinking."
Forget full attention: a hybrid sparse-linear attention model, MiniCPM-SALA, achieves 3.5x faster inference and supports 1M context length on a single GPU, all while maintaining comparable performance.