Search papers, labs, and topics across Lattice.
3
0
6
6
RAG's reputation for being ineffective in reasoning tasks is shattered by showing that retrieving the right data – intermediate "thinking traces" – unlocks substantial performance gains, even for state-of-the-art models.
Even the most advanced LLMs stumble when asked to reason over a large, heterogeneous document corpus, achieving only 34% accuracy on the new OfficeQA Pro benchmark despite direct access to the relevant documents.
LLM-driven program evolution gets a smart upgrade: AdaEvolve dynamically allocates resources to promising solution candidates, leaving static schedules in the dust.