Search papers, labs, and topics across Lattice.
2
0
6
2
Hallucinating LLMs in enterprise workflows can be tamed by a new Hybrid Utility Minimum Bayes Risk (HUMBR) framework that synthesizes semantic and lexical signals to achieve consensus without ground truth.
FlashAttention-4 shatters attention bottlenecks on Blackwell GPUs, achieving up to 71% hardware utilization and 2.7x speedups over Triton, thanks to innovations like software-emulated softmax and asynchronous MMA pipelines.