Search papers, labs, and topics across Lattice.
3
0
6
0
Forget scaling up; this study shows that the right reasoning architecture鈥攕tructured memory for deterministic tasks, RAG for conversational ones鈥攃an dramatically outperform a baseline LLM even when compute is severely limited.
LLMs can now autonomously retrieve relevant memories from a database using specialized tools, significantly improving performance on long-term conversational question answering.
Slash RAG latency by an order of magnitude using a tiny, LoRA-adapted SLM that routes queries, achieving GPT-4o-mini level accuracy at a fraction of the cost.