Search papers, labs, and topics across Lattice.
4
0
4
Squeeze your LLM's KV cache by 82% without significant performance loss using VQKV's novel vector quantization approach.
Continuous diffusion LMs can rival discrete models by fixing the token-rounding bottleneck with a contextual autoregressive decoder, unlocking a fluency-diversity knob in the process.
Key contribution not extracted.
Stop wasting compute: PonderLM-3 learns to spend extra inference FLOPs only on the tokens that actually need them, outperforming fixed-step pondering methods.