Search papers, labs, and topics across Lattice.
4
2
4
9
Video diffusion models can be aggressively quantized down to 6-bit precision with minimal quality loss by dynamically adapting the bit-width of each layer based on its temporal stability.
Ditch the stochasticity: Deterministic pruning slashes LLM size with minimal performance loss, outperforming stochastic methods and accelerating inference.
SpargeAttention2 achieves 95% attention sparsity in video diffusion models with a 16.2x speedup, proving that trainable sparse attention can significantly outperform training-free methods without sacrificing generation quality.
Achieve an 18.6x speedup in video diffusion models with 97% attention sparsity by learning how to route and combine sparse and linear attention, outperforming heuristic approaches.