Search papers, labs, and topics across Lattice.
2
0
3
LLM-generated health counseling appears promising but reveals critical stakeholder disagreements on tone and error handling, highlighting the need for more nuanced evaluation beyond simple relevance and quality metrics.
LLMs' performance on False Belief Tests isn't just about size – it's profoundly skewed by how you phrase the question, revealing that models learn stereotypical responses to mental-state vocabulary during pre-training.