Search papers, labs, and topics across Lattice.
2
0
4
Removing gold answer strings from rewritten contexts can cause F1 scores to plummet by up to 64 points, underscoring their critical role in retrieval-augmented QA performance.
LLM agents struggle to consistently reflect human-like psychology, even when provided with extensive personality profiles and autobiographical memories, suggesting current models lack a deeper understanding of human behavior.