Search papers, labs, and topics across Lattice.
5
0
8
0
Multimodal models can now handle audio natively with improved efficiency, achieving state-of-the-art results in complex tasks like document understanding and agentic computer use.
Forget live teacher inference servers: Lightning OPD unlocks 4x faster LLM post-training by precomputing teacher log-probabilities, without sacrificing performance on complex reasoning tasks.
Scaling diffusion model alignment just got a whole lot cheaper: Sol-RL uses FP4 rollouts to accelerate training convergence by up to 4.64x without sacrificing performance.
Swap out slow, one-token-at-a-time generation in VLMs for a 6x speed boost, without sacrificing quality, using a surprisingly simple direct conversion to block-diffusion decoding.
LLMs can achieve 2.5x higher throughput and 10.7x KV memory reduction in long-context reasoning by compressing the KV cache using trigonometric functions derived from pre-RoPE query/key vector distributions.