Search papers, labs, and topics across Lattice.
Tsinghua University
9
2
5
13
Autoregressive video diffusion gets a 2x speed boost with minimal quality loss, thanks to a clever speculative decoding approach that uses an image-quality router to verify proposed video blocks.
You can slash LLM inference energy by 35% on edge devices just by intelligently managing eDRAM refresh rates based on activation data type and lifespan.
Video diffusion models can be aggressively quantized down to 6-bit precision with minimal quality loss by dynamically adapting the bit-width of each layer based on its temporal stability.
K-means, previously relegated to offline processing, gets a 17.9x speed boost on modern GPUs thanks to Flash-KMeans' clever IO and contention optimizations.
Get 2x faster video generation from diffusion transformers without sacrificing quality, thanks to a clever parameter-free error compensation technique.
Achieve nearly 2x speedup in Stable Diffusion 3 by intelligently stitching together large and small diffusion models at both the pixel and timestep level.
Trainable INT8 attention can match full-precision attention during pre-training, but only if you normalize QK and reduce tokens per step.
Achieve an 18.6x speedup in video diffusion models with 97% attention sparsity by learning how to route and combine sparse and linear attention, outperforming heuristic approaches.
SpargeAttention2 achieves 95% attention sparsity in video diffusion models with a 16.2x speedup, proving that trainable sparse attention can significantly outperform training-free methods without sacrificing generation quality.