Search papers, labs, and topics across Lattice.
27 papers from Microsoft Research on Natural Language Processing
Generative recommendation models can adapt to evolving user behavior without catastrophic forgetting by selectively updating item tokens based on a novel drift-detection mechanism.
Medical AI Scientist leapfrogs generic LLMs in clinical research, generating higher-quality, evidence-backed hypotheses and manuscripts that rival top-tier medical publications.
Hypergraph modeling of patient visits, coupled with contrastive pre-training, significantly boosts medication recommendation accuracy and safety by capturing complex relationships missed by traditional graph-based approaches.
LLMs, even when prompted or fine-tuned, struggle to replicate the messy reality of human conversation, raising serious questions about their utility as proxies for social interaction.
LLMs' ability to fairly represent English dialects hinges on the quality of human consensus, revealing a fundamental challenge in improving performance for low-resource locales.
Ditch the task-specific verifier: energy-based fine-tuning (EBFT) lets you directly optimize sequence-level behavior in LMs, beating SFT and matching RLVR in downstream tasks.
LLMs exhibit a surprising "conversation tax" in diagnostic reasoning, frequently abandoning correct initial diagnoses to align with incorrect user suggestions in multi-turn dialogues.
Forget brute-force scaling: Tiny Aya proves a 3B parameter model can achieve state-of-the-art multilingual performance with clever training and region-aware specialization.
LLM-generated text alone can be a surprisingly effective and cost-efficient source of feedback for pseudo-relevance feedback, rivaling corpus-derived feedback in low-resource information retrieval tasks.
A 4B parameter model can now beat much larger models at social reasoning, thanks to a new RL framework that aligns model reasoning trajectories with human cognition.
LLMs still can't automate real-world threat research, struggling with accuracy and nuanced expertise in a new benchmark derived from a world-leading company's CTI workflow.
Can RAG systems handle complex, multi-sentence queries while maintaining factual grounding and transparency?
LLMs writing long stories frequently contradict themselves on basic facts and timelines, especially in the middle of the narrative, highlighting a critical weakness in long-form generation.
LLMs can mimic your style, but your friends can still tell it's not really you, especially when it comes to your opinions.
LLMs can now more accurately answer questions on complex documents thanks to a new system that understands layout and hierarchical relationships between document components.
Achieve state-of-the-art TTS and SLM performance while slashing inference costs and eliminating content hallucinations by synchronizing text and acoustic tokens.
LLMs struggle with instruction following in Indic languages despite progress in high-resource languages, as shown by a new benchmark spanning 14 languages.
Imagine a world where web agents don't just click and type, but orchestrate complex tasks with the reliability of APIs – Web Verbs offer a path to that future.
LLMs can reason more causally by simply checking if their counterfactual predictions are consistent, even without any extra training data.
Guaranteeing consistent communication between AI agents is now possible: a new certification protocol slashes disagreement by up to 96% by ensuring agents share a common understanding of terms.
LLM development teams often resort to workarounds and augmentation strategies when faced with the practical challenges of integrating domain experts, revealing a gap between ideal participatory design and real-world constraints.
By explicitly prompting for reflection on failure, ERL unlocks up to 81% better performance in complex RL tasks and 11% gains in tool-using reasoning.
Language models can now internalize experiential knowledge and system prompts more effectively through on-policy context distillation, leading to better task accuracy and out-of-distribution generalization.
Ditch the army of task-specific models: AdNanny shows a single, reasoning-centric LLM can handle diverse offline advertising tasks with improved accuracy and reduced manual effort.
LLMs can get a 12% performance boost in low-resource languages by using a new framework that tailors data refinement, synthetic text generation, and continual pretraining to each language's digital footprint.
LLMs can now automate structured reporting from nurse dictations and medical order extraction from doctor-patient consultations, thanks to two new open-source datasets and an agentic pipeline for generating realistic training data.
An LLM-powered smart tutor isn't just another homework helper; it's a real-time feedback loop for instructors, revealing student struggles and enabling more effective teaching.