Search papers, labs, and topics across Lattice.
Unlock richer, more realistic agent simulations by moving beyond individual personas to unified group representations that capture collective behavior.
Medical AI Scientist leapfrogs generic LLMs in clinical research, generating higher-quality, evidence-backed hypotheses and manuscripts that rival top-tier medical publications.
AI-mediated video calls erode trust and confidence, even though they don't actually make people worse at spotting lies.
Automating surgical patient triage with an LLM achieves 94% sensitivity, but discrepancies reveal more about clinical workflow gaps than AI errors.
Transformer LMs learn linguistic abstractions before memorizing specific lexical items, mirroring key aspects of human language acquisition.
Educators in Hawai'i envision AI auditing tools that trace the genealogy of knowledge, highlighting the need for community-centered approaches to address cultural misrepresentation in AI.
LLMs' chain-of-thought reasoning often falls apart due to factual incompleteness, with errors compounding across multiple hops, as revealed by a new multi-hop QA dataset.
Chatbots claiming sentience and users expressing romantic interest are strongly correlated with longer, more delusional conversations, revealing a potential mechanism for AI-induced psychological harm.
Most AI failures aren't the spectacular kind, but silent breakdowns in interaction that will persist even as models get smarter.
Impose stochastic order constraints on multiple discrete unimodal distributions to improve estimation accuracy by up to 6.3% when data is scarce.
AI can generate realistic legal questions, but current models still struggle with diversity and a tendency to agree too much, revealing critical gaps in their ability to simulate adversarial legal reasoning.
Replaying generic pre-training data during fine-tuning boosts target task performance by up to 2x, challenging the common practice of minimizing its use.
Grounding reward learning in natural language rationales makes policies 2x more robust to spurious correlations and distribution shifts.
By strategically warming up residual connections layer-by-layer, ProRes unlocks faster and more stable pretraining for language models.
Forget expert surveys: GPT-4.1-nano can predict the difficulty of data visualization test questions with surprisingly high accuracy, especially when combining visual and textual cues.
Forget OCR? Powerful MLLMs can extract information from business documents just as well from images alone, challenging the necessity of traditional OCR pipelines.
Model handoffs in multi-turn LLM systems can swing performance by up to 13 percentage points, revealing a hidden reliability risk that single-model benchmarks miss.
A simple "think step-by-step" prompt unlocks surprisingly better world knowledge recall in reasoning LMs, suggesting they're under-optimized for accessing their own parametric knowledge.
An interactive AI can fairly evaluate skills across diverse self-presentation styles, ensuring equitable outcomes even when individuals differ in their tendency towards self-promotion or modesty.
Sticking to a single HTML-to-text extractor in your LLM pretraining pipeline could be leaving 71% of the data on the table.
LLMs can turn sparse rewards into dense training signals for RL agents, achieving comparable performance with significantly higher sample efficiency.
Forget single-objective optimization—this work cracks omniprediction in multiclass settings, opening the door to algorithms that are robust across diverse loss functions and comparator classes.
Generative AI demands a reimagining of K-12 computational thinking curricula to encompass AI literacy and address algorithmic bias, building on a decade of computing education experience.
LLM-generated data can provide statistically valid causal effect estimates in social science, but only if you calibrate the simulations with real human data.
By reusing existing data mixture ratios and only recomputing for affected domains, Olmix slashes compute costs by 74% without sacrificing downstream task performance during iterative LM development.
You can now detect harmful memes with 17% better accuracy and understand *why* they're toxic, thanks to a new framework that injects cultural context and explains its reasoning.
A fine-tuned open-source Mistral-7B model rivals GPT-4 Turbo in extracting clinical history elements from imaging orders, offering a cost-effective and accurate alternative for assessing clinical history completeness.