Search papers, labs, and topics across Lattice.
9
0
12
0
Forget fancy quantization schemes – a simple token-wise INT4 quantization with Hadamard rotation is all you need to nearly match FP16 accuracy in LLM serving, without sacrificing throughput.
Multi-agent exploration with dynamic starting points can more than double the success rate of web navigation tasks compared to single-agent methods that start from the root URL.
Generalist foundation models beat specialized GUI agents at e-commerce risk management, suggesting scale trumps zero-shot grounding for complex, real-world web tasks.
On-policy distillation can lead to catastrophic length inflation in student models, but a simple fix stabilizes training and boosts performance by 7%.
By explicitly grounding a knowledge graph within source documents, GroundedKG-RAG slashes resource consumption and hallucinations in long-document QA, achieving performance on par with proprietary models at a fraction of the cost.
Surface-level metrics like BLEU are misleading for evaluating dialogue systems, as human and LLM judges reveal critical flaws in coherence and consistency that these metrics miss entirely.
Achieve real-time autonomous driving policy generation with a new flow-matching RL algorithm that slashes inference latency without sacrificing performance.
LLMs can now explore knowledge graphs on their own, discovering better reasoning paths and outperforming even closed-source models on question answering.
Injecting LLM-generated textual descriptions of facial action units into a vision model substantially boosts AU detection performance, suggesting a powerful way to leverage language priors in computer vision.