Search papers, labs, and topics across Lattice.
7
15
9
20
Explicitly enumerating skills in-context doesn't scale for agentic LLMs, but retrieving skills on demand can substantially improve performance – if the LLM can figure out when and which skill to load.
Humans are still way better than LLMs at trial-and-error problem solving, and this new dataset of human problem-solving trajectories shows us why.
LLM-based simulations of public opinion suffer from "Diversity Collapse," but injecting explicit social identity representations into hidden states can fix it.
Current search paradigms fall short for analytical tasks, motivating a new "analytical search" framework that treats search as an evidence-driven, multi-step reasoning process.
LLMs still can't convincingly mimic human personas, especially when it comes to syntactic style and memory, despite advancements in other areas.
LLMs still struggle to learn effectively from user feedback during service, as revealed by a new benchmark spanning multiple domains and languages.
LLMs still struggle to synthesize coherent scientific surveys, as evidenced by a new benchmark revealing significant performance gaps even with advanced agentic frameworks.