Search papers, labs, and topics across Lattice.
18 papers from Amazon Science on Natural Language Processing
LLMs can achieve better zero-shot product ranking with 57% less token usage by reasoning over structured attribute graphs instead of raw text.
Turns out, the best template for documenting architectural decisions depends on whether you value conciseness (Nygard) or structural detail (MADR).
Directly embedding quantile tokens into input sequences leads to sharper and more accurate distribution predictions, outperforming traditional methods by a substantial margin.
SpeechLLMs' hallucinations betray themselves in their attention patterns, offering a new way to detect these errors without needing expensive human-labeled data.
Achieve 75% input length reduction in LLMs with minimal performance loss by compressing token embeddings directly in the latent space.
RAG systems are stuck in a factual echo chamber, ignoring the rich tapestry of opinions that shape real-world understanding.
LLMs aren't culture-aware reasoners, but biased translators: they generate stereotyped metaphors and default to Western perspectives even when prompted with specific cultural identities.
LLM-generated survey responses can be statistically accurate yet still miss the option most preferred by humans, highlighting a critical flaw in current evaluation methods.
Forget expensive multilingual annotations: this framework lets you evaluate LLMs in new languages by transferring knowledge from English, with surprisingly strong results.
LoRA fine-tuning can significantly boost the voice cloning capabilities of LLM-based TTS systems, but only if the training data is acoustically diverse enough.
LLM-based recommender systems can trigger users' personal trauma, phobias, or self-harm history, but a new framework cuts these safety violations by 96.5% while maintaining recommendation quality.
Injecting knowledge graphs into LLMs boosts medical question generation by 8%, suggesting a simple way to patch up LLM knowledge gaps.
Latent reasoning models often take shortcuts to achieve high accuracy, and stronger supervision, while mitigating this, paradoxically restricts the diversity of their latent representations.
Forget fine-tuning: inject targeted time-series insights into general LLMs and watch their reasoning skills skyrocket by up to 26%.
An end-to-end system extracts funny scenes from movies with 87% accuracy, opening new avenues for automated content repurposing.
Give new e-commerce products a warm start by borrowing behavioral signals from their substitutes, boosting search relevance and product discovery.
LLMs evaluating job candidates exhibit significant bias against hedging language, docking candidates by 25.6% on average, even when the content is equivalent.
By focusing on the most challenging examples, CRPO significantly boosts machine translation accuracy and data efficiency compared to standard preference optimization techniques.