Search papers, labs, and topics across Lattice.
23 papers from Microsoft Research on Natural Language Processing
TwinGate stops jailbreaks by tracking malicious intent across anonymized, interleaved queries with minimal overhead, something previous defenses couldn't do.
Discrete diffusion models can be sped up by 14x by intelligently choosing which tokens to sample at each step, without sacrificing accuracy.
Surprisingly, a trie-guided decoding framework applied to smaller encoder-decoder models like T5 and BART can outperform much larger instruction-tuned models like LLaMA-3 and Phi-3 in in-document query auto-completion.
Token-level attribution struggles to pinpoint the causes of LLM failures in realistic settings, suggesting current interpretability tools may not be up to the task of debugging complex model behaviors.
RosettaSearch recovers up to 68% more structural fidelity in protein designs, transforming how we optimize sequences beyond traditional single-pass methods.
Forget hand-crafted templates: DUET learns to generate user and item profiles jointly, boosting recommendation accuracy by better aligning textual representations.
Autonomous web agents get a serious upgrade with WebXSkill, which lets them learn and execute skills with both code-level precision and human-readable guidance.
LLMs are twice as likely as humans to repeat the same support tactic in a conversation, but a simple RL reward for tactic novelty can fix it.
Gaze-tracking unlocks a new level of personalized AI assistance, enabling LLMs to infer user cognitive states and boost recall performance.
Generative recommendation systems can now adapt to evolving user behavior without catastrophic forgetting, thanks to a novel drift-aware tokenization method that selectively updates item representations.
Hypergraph modeling of patient visits, coupled with contrastive pre-training, significantly boosts medication recommendation accuracy and safety by capturing complex relationships missed by traditional graph-based approaches.
LLMs, even when prompted or fine-tuned, struggle to replicate the messy reality of human conversation, raising serious questions about their utility as proxies for social interaction.
LLMs' ability to fairly represent English dialects hinges on the quality of human consensus, revealing a fundamental challenge in improving performance for low-resource locales.
LLMs still can't automate real-world threat research, struggling with accuracy and nuanced expertise in a new benchmark derived from a world-leading company's CTI workflow.
LLMs writing long stories frequently contradict themselves on basic facts and timelines, especially in the middle of the narrative, highlighting a critical weakness in long-form generation.
LLMs can mimic your style, but your friends can still tell it's not really you, especially when it comes to your opinions.
LLMs can now more accurately answer questions on complex documents thanks to a new system that understands layout and hierarchical relationships between document components.
Imagine a world where web agents don't just click and type, but orchestrate complex tasks with the reliability of APIs – Web Verbs offer a path to that future.
Guaranteeing consistent communication between AI agents is now possible: a new certification protocol slashes disagreement by up to 96% by ensuring agents share a common understanding of terms.
LLM development teams often resort to workarounds and augmentation strategies when faced with the practical challenges of integrating domain experts, revealing a gap between ideal participatory design and real-world constraints.
By explicitly prompting for reflection on failure, ERL unlocks up to 81% better performance in complex RL tasks and 11% gains in tool-using reasoning.
Ditch the army of task-specific models: AdNanny shows a single, reasoning-centric LLM can handle diverse offline advertising tasks with improved accuracy and reduced manual effort.
LLMs can get a 12% performance boost in low-resource languages by using a new framework that tailors data refinement, synthetic text generation, and continual pretraining to each language's digital footprint.