NTUAMar 4, 2026arXiv:2603.04319

AILS-NTUA at SemEval-2026 Task 12: Graph-Based Retrieval and Reflective Prompting for Abductive Event Reasoning

Nikolas Karafyllis, Maria Lymperaiou, Giorgos Filandrianos, Athanasios Voulodimos, G. Stamou, Giorgos Stamou

AI Summary

The authors present a three-stage system for abductive event reasoning, combining graph-based retrieval, LLM-driven reasoning with reflective prompt evolution, and consistency enforcement. The system achieved first place on the SemEval 2026 Task 12 leaderboard with 0.95 accuracy. Error analysis across 14 models revealed three shared inductive biases—causal chain incompleteness, proximate cause preference, and salience bias—suggesting systematic failure modes in multi-label causal reasoning.

Key Contribution

LLMs systematically fail at multi-label causal reasoning due to shared inductive biases like causal chain incompleteness, even across diverse model families.

Abstract

We present a winning three-stage system for SemEval 2026 Task~12: Abductive Event Reasoning that combines graph-based retrieval, LLM-driven abductive reasoning with prompt design optimized through reflective prompt evolution, and post-hoc consistency enforcement; our system ranks first on the evaluation-phase leaderboard with an accuracy score of 0.95. Cross-model error analysis across 14 models (7~families) reveals three shared inductive biases: causal chain incompleteness, proximate cause preference, and salience bias, whose cross-family convergence (51\% cause-count reduction) indicates systematic rather than model-specific failure modes in multi-label causal reasoning.

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References35

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

AILS-NTUA at SemEval-2026 Task 12: Graph-Based Retrieval and Reflective Prompting for Abductive Event Reasoning

Related Papers