Search papers, labs, and topics across Lattice.
4 papers published across 2 labs.
Full-attention LLMs are intrinsically sparse and can be transformed into highly efficient sparse models with minimal training, sidestepping the need for expensive sparse pre-training.
LLM agents can now maintain long-term memories with 6x higher throughput thanks to a novel hierarchical temporal indexing approach that avoids costly full-state rewrites.
Forget static KV cache compression – KVServe dynamically adapts compression strategies to your service context, slashing latency by up to 32.8x in disaggregated LLM serving.
Stop IP thieves cold: LoREnc lets you lock down your foundation models and LoRA adapters without retraining, crushing model recovery attacks while keeping performance intact for authorized users.