Amazon Science

×Natural Language Processing

18 papers from Amazon Science on Natural Language Processing

Apr 30, 2026

From Unstructured to Structured: LLM-Guided Attribute Graphs for Entity Search and Ranking

LLMs can achieve better zero-shot product ranking with 57% less token usage by reasoning over structured attribute graphs instead of raw text.

Yilun Zhu, Nikhita Vedula, S. Malmasi +1

Natural Language Processing Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Amazon Science3w ago

One Size Fits All? An Empirical Comparison of ADR Templates regarding Comprehension, Usability, and Ease of Adoption

Turns out, the best template for documenting architectural decisions depends on whether you value conciseness (Nygard) or structural detail (MADR).

Fernando Nogueira, F. Nogueira, Nabson Silva +1

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Natural Language Processing

Apr 22, 2026

Amazon ScienceApr 22, 2026

Text-to-Distribution Prediction with Quantile Tokens and Neighbor Context

Directly embedding quantile tokens into input sequences leads to sharper and more accurate distribution predictions, outperforming traditional methods by a substantial margin.

Zhuang Yuan, Nikhita Vedula, Dushyanta Dhyani +5

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Apr 21, 2026

Amazon ScienceApr 21, 2026

Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps

SpeechLLMs' hallucinations betray themselves in their attention patterns, offering a new way to detect these errors without needing expensive human-labeled data.

Jonas Waldendorf, Bashar Awwad Shiekh Hasan, Evgenii Tsymbalov

Interpretability & Mechanistic Interp Natural Language Processing Speech & Audio

Apr 16, 2026

Amazon ScienceApr 16, 2026·also UIUC

Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models

Achieve 75% input length reduction in LLMs with minimal performance loss by compressing token embeddings directly in the latent space.

Zihao Xu, Zihao Xu, John Harvill +5

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Natural Language Processing

Apr 13, 2026

Amazon ScienceApr 13, 2026

Beyond Factual Grounding: The Case for Opinion-Aware Retrieval-Augmented Generation

RAG systems are stuck in a factual echo chamber, ignoring the rich tapestry of opinions that shape real-world understanding.

Aditya Agrawal, Alwarappan Nakkiran, Darshan Fofadiya +2

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Apr 6, 2026

Amazon ScienceApr 6, 2026

Metaphors We Compute By: A Computational Audit of Cultural Translation vs. Thinking in LLMs

LLMs aren't culture-aware reasoners, but biased translators: they generate stereotyped metaphors and default to Western perspectives even when prompted with specific cultural identities.

Yuan Chang, Jiaming Qu, Zhu Li

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Mar 19, 2026

Amazon ScienceMar 19, 2026

RADIUS: Ranking, Distribution, and Significance - A Comprehensive Alignment Suite for Survey Simulation

LLM-generated survey responses can be statistically accurate yet still miss the option most preferred by humans, highlighting a critical flaw in current evaluation methods.

Weronika Łajewska, Weronika Lajewska, Paul Missault +2

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Amazon ScienceMar 19, 2026

Cross-Lingual LLM-Judge Transfer via Evaluation Decomposition

Forget expensive multilingual annotations: this framework lets you evaluate LLMs in new languages by transferring knowledge from English, with surprisingly strong results.

Ivaxi Sheth, Ivaxi Sheth, Zeno Jonke +5

Eval Frameworks & Benchmarks Natural Language Processing

Mar 11, 2026

Amazon ScienceMar 11, 2026

When Fine-Tuning Fails and when it Generalises: Role of Data Diversity and Mixed Training in LLM-based TTS

LoRA fine-tuning can significantly boost the voice cloning capabilities of LLM-based TTS systems, but only if the training data is acoustically diverse enough.

Anupam Purwar, Aditya Choudhary

Natural Language Processing Speech & Audio Training Efficiency & Optimization

Mar 3, 2026

Mar 3, 2026·also Amazon Science

SafeCRS: Personalized Safety Alignment for LLM-Based Conversational Recommender Systems

LLM-based recommender systems can trigger users' personal trauma, phobias, or self-harm history, but a new framework cuts these safety violations by 96.5% while maintaining recommendation quality.

Haochang Hao, Xinzhuo Li, Yingqiang Ge

Constitutional AI & AI Ethics Natural Language Processing Recommendation & Information Retrieval

Mar 1, 2026

CMU MLMar 1, 2026·also Amazon Science

Linking Knowledge to Care: Knowledge Graph-Augmented Medical Follow-Up Question Generation

Injecting knowledge graphs into LLMs boosts medical question generation by 8%, suggesting a simple way to patch up LLM knowledge gaps.

Liwen Sun, Xiang Yu, Ming Tan +2

Natural Language Processing Scientific Discovery & Drug Design Tool Use & Agents

Feb 25, 2026

Amazon ScienceFeb 25, 2026·also Michigan State

How Do Latent Reasoning Methods Perform Under Weak and Strong Supervision?

Latent reasoning models often take shortcuts to achieve high accuracy, and stronger supervision, while mitigating this, paradoxically restricts the diversity of their latent representations.

Yingqian Cui, Zhenwei Dai, Bing He +5

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

Feb 23, 2026

Amazon ScienceFeb 23, 2026·also Anthropic

SenTSR-Bench: Thinking with Injected Knowledge for Time-Series Reasoning

Forget fine-tuning: inject targeted time-series insights into general LLMs and watch their reasoning skills skyrocket by up to 26%.

Zelin He, Zelin He, Boran Han +17

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

Feb 17, 2026

Amazon ScienceFeb 17, 2026

Automatic Funny Scene Extraction from Long-form Cinematic Videos

An end-to-end system extracts funny scenes from movies with 87% accuracy, opening new avenues for automated content repurposing.

Sibendu Paul, Haotian Jiang, Caren Chen

Computer Vision Natural Language Processing Recommendation & Information Retrieval

Feb 16, 2026

Amazon ScienceFeb 16, 2026

Behavioral Feature Boosting via Substitute Relationships for E-commerce Search

Give new e-commerce products a warm start by borrowing behavioral signals from their substitutes, boosting search relevance and product discovery.

Chaosheng Dong, Michinari Momma, Yijia Wang

Natural Language Processing Recommendation & Information Retrieval

Aug 6, 2025

UWAug 6, 2025·also Amazon Science, BAIR, Stanford HAI

I Think, Therefore I Am Under-Qualified? A Benchmark for Evaluating Linguistic Shibboleth Detection in LLM Hiring Evaluations

LLMs evaluating job candidates exhibit significant bias against hedging language, docking candidates by 25.6% on average, even when the content is equivalent.

Julia Kharchenko, Tanya Roosta, Aman Chadha +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Jan 23, 2025

Jan 23, 2025·also Amazon Science, Tsinghua AI, NTU, PKU

CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation

By focusing on the most challenging examples, CRPO significantly boosts machine translation accuracy and data efficiency compared to standard preference optimization techniques.

Guofeng Cui, Pichao Wang, Yang Liu +3

Natural Language Processing RLHF & Preference Learning