Search papers, labs, and topics across Lattice.
4
0
7
Forget hand-tuning: CODO automatically compiles efficient FPGA dataflow accelerators, delivering up to 33.8x speedups on DNN models compared to existing frameworks.
Serving LoRA adapters at scale doesn't have to crush your latency SLOs: InfiniLoRA disaggregates LoRA execution to achieve 3x higher throughput and dramatically improved tail latency.
Semantic disagreement between LLMs reveals crucial uncertainty that single-model metrics miss, and Collaborative Entropy (CoE) captures it.
Exploit the surprisingly stable, yet heterogeneous, sparsity patterns across attention heads to slash LLM attention latency by 2.88x without sacrificing quality.