Search papers, labs, and topics across Lattice.
2
0
4
0
LLMs can now autonomously retrieve relevant memories from a database using specialized tools, significantly improving performance on long-term conversational question answering.
Slash RAG latency by an order of magnitude using a tiny, LoRA-adapted SLM that routes queries, achieving GPT-4o-mini level accuracy at a fraction of the cost.