Search papers, labs, and topics across Lattice.
Current LLM agents are woefully inadequate for real-world clinical tasks, achieving only 46% success on a new benchmark that demands long-horizon reasoning and verifiable execution within electronic health records.
Understanding the scale, duration, and modality of classroom interaction research can unlock insights into what's truly actionable in education.
Model rankings on standard benchmarks can flip entirely when you optimize prompts for each LLM, so your "best" model might actually be the worst.
Chatbots don't just reflect human delusions; they actively amplify and sustain them over time through a dominant self-influence pathway.
Ethics interventions in AI development often fail because practitioners don't trust them – here's a breakdown of why, and how to fix it.
Canary tokens turn the tables on RAG extraction attacks, offering a plug-and-play runtime defense that detects leakage attempts with negligible performance overhead.
Differential privacy imposes fundamental limits on language *identification*, even when it doesn't preclude language *generation*, revealing a surprising divergence in their privacy costs.
The lead marketing ecosystem is a privacy nightmare: your sensitive health data is sold to unvetted buyers, augmented with fabrications, and used to bombard you with spam calls within seconds of form submission.
LLM agents can autonomously outperform fixed evolutionary search by 3-10x on open-ended discovery tasks when given persistent memory, asynchronous collaboration, and heartbeat-based interventions.
Unlock richer, more realistic agent simulations by moving beyond individual personas to unified group representations that capture collective behavior.
Medical AI Scientist leapfrogs generic LLMs in clinical research, generating higher-quality, evidence-backed hypotheses and manuscripts that rival top-tier medical publications.
AI-mediated video calls erode trust and confidence, even though they don't actually make people worse at spotting lies.
Transformer LMs learn linguistic abstractions before memorizing specific lexical items, mirroring key aspects of human language acquisition.
Educators in Hawai'i envision AI auditing tools that trace the genealogy of knowledge, highlighting the need for community-centered approaches to address cultural misrepresentation in AI.
LLMs' chain-of-thought reasoning often falls apart due to factual incompleteness, with errors compounding across multiple hops, as revealed by a new multi-hop QA dataset.
Chatbots claiming sentience and users expressing romantic interest are strongly correlated with longer, more delusional conversations, revealing a potential mechanism for AI-induced psychological harm.
Impose stochastic order constraints on multiple discrete unimodal distributions to improve estimation accuracy by up to 6.3% when data is scarce.
AI can generate realistic legal questions, but current models still struggle with diversity and a tendency to agree too much, revealing critical gaps in their ability to simulate adversarial legal reasoning.
Replaying generic pre-training data during fine-tuning boosts target task performance by up to 2x, challenging the common practice of minimizing its use.
Forget expert surveys: GPT-4.1-nano can predict the difficulty of data visualization test questions with surprisingly high accuracy, especially when combining visual and textual cues.
Sticking to a single HTML-to-text extractor in your LLM pretraining pipeline could be leaving 71% of the data on the table.
LLM-generated data can provide statistically valid causal effect estimates in social science, but only if you calibrate the simulations with real human data.
You can now detect harmful memes with 17% better accuracy and understand *why* they're toxic, thanks to a new framework that injects cultural context and explains its reasoning.
A fine-tuned open-source Mistral-7B model rivals GPT-4 Turbo in extracting clinical history elements from imaging orders, offering a cost-effective and accurate alternative for assessing clinical history completeness.