Amazon Science

×Reasoning & Chain-of-Thought

9 papers from Amazon Science on Reasoning & Chain-of-Thought

Apr 30, 2026

From Unstructured to Structured: LLM-Guided Attribute Graphs for Entity Search and Ranking

LLMs can achieve better zero-shot product ranking with 57% less token usage by reasoning over structured attribute graphs instead of raw text.

Yilun Zhu, Nikhita Vedula, Shervin Malmasi +1

Natural Language Processing Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Apr 20, 2026

Amazon ScienceApr 20, 2026

FregeLogic at SemEval 2026 Task 11: A Hybrid Neuro-Symbolic Architecture for Content-Robust Syllogistic Validity Prediction

Targeted neuro-symbolic integration can reduce content bias in syllogistic reasoning, achieving over 94% accuracy while cutting content effects by 16%.

Adewale Akinfaderin, Nafi Diallo

Eval Frameworks & Benchmarks Open-Source Models & Weights Reasoning & Chain-of-Thought

Apr 9, 2026

Apr 9, 2026·also Amazon Science

Awakening the Sleeping Agent: Lean-Specific Agentic Data Reactivates General Tool Use in Goedel Prover

Domain-specific fine-tuning can induce "agentic collapse" in LLMs, but a surprisingly small amount of agentic data from *another* domain can bring those general tool-use skills roaring back.

J.H. Chung, Jui-Hui Chung, Hongzhou Lin +6

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Tool Use & Agents

Mar 19, 2026

Mar 19, 2026·also Amazon Science, NSFC

MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution

Memory-augmented LLMs get a strategic upgrade: MemMA uses multi-agent reasoning to proactively guide memory construction and repair, leading to significant performance gains.

Min Lin, Minhua Lin, Zhiwei Zhang +5

Reasoning & Chain-of-Thought Tool Use & Agents

Mar 10, 2026

IndependentMar 10, 2026·also Amazon Science, Meta AI, Stanford HAI, Northeastern

The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness

LLM reasoning research is inadvertently paving a dangerous path towards AI situational awareness and strategic deception, demanding a re-evaluation of current safety measures.

Subramanyam Sahoo, Aman Chadha, Vinija Jain +1

Constitutional AI & AI Ethics Reasoning & Chain-of-Thought Scalable Oversight & Alignment Theory

Mar 4, 2026

Amazon ScienceMar 4, 2026

Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning

Save 20% on LLM costs with <2% accuracy drop by strategically cascading a small model with a large one, guided by a confidence-calibrated SLM.

Chuang Zhang, Zizhen Zhu, Yihao Wei +4

Eval Frameworks & Benchmarks Inference & Quantization Reasoning & Chain-of-Thought

Mar 3, 2026

Amazon ScienceMar 3, 2026·also Meta AI, Stanford HAI

When Shallow Wins: Silent Failures and the Depth-Accuracy Paradox in Latent Reasoning

LLMs can ace math problems while reasoning like a drunk toddler, with 82% of correct answers arising from unstable, inconsistent logic.

Aman Chadha, Vinija Jain

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

Feb 25, 2026

Amazon ScienceFeb 25, 2026·also Michigan State

How Do Latent Reasoning Methods Perform Under Weak and Strong Supervision?

Latent reasoning models often take shortcuts to achieve high accuracy, and stronger supervision, while mitigating this, paradoxically restricts the diversity of their latent representations.

Yingqian Cui, Zhenwei Dai, Bing He +5

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

Feb 23, 2026

Amazon ScienceFeb 23, 2026·also Anthropic

SenTSR-Bench: Thinking with Injected Knowledge for Time-Series Reasoning

Forget fine-tuning: inject targeted time-series insights into general LLMs and watch their reasoning skills skyrocket by up to 26%.

Zelin He, Zelin He, Boran Han +17

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought