Search papers, labs, and topics across Lattice.
Stanford University
4
2
6
13
Model rankings on standard benchmarks can flip entirely when you optimize prompts for each LLM, so your "best" model might actually be the worst.
Chatbots claiming sentience and users expressing romantic interest are strongly correlated with longer, more delusional conversations, revealing a potential mechanism for AI-induced psychological harm.
Stop evaluating agents in a vacuum: TED reveals how user expertise impacts agent performance and pinpoints actionable error remedies, boosting performance by 8-10%.
Forget OCR? Powerful MLLMs can extract information from business documents just as well from images alone, challenging the necessity of traditional OCR pipelines.