Search papers, labs, and topics across Lattice.
University of Central Florida
2
0
5
C$^3$ache reveals that reusing residuals across inference chunks can dramatically speed up World Action Models, achieving a 2.5x reduction in inference time with minimal impact on performance.
Scaling up LLMs boosts combinatorial creativity in code generation, but plateaus on exploratory tasks, revealing a "convergence-by-scaling" effect where larger models become less divergent.