Adelaide UniversityApr 21, 2026arXiv:2604.20043

TriEx: A Game-based Tri-View Framework for Explaining Internal Reasoning in Multi-Agent LLMs

Ziyi Wang, Chen Zhang, Wenjun Peng, Qi Wu, Xinyu Wang

AI Summary

The paper introduces TriEx, a tri-view explainability framework for multi-agent LLMs in interactive settings, providing structured self-reasoning, belief states about opponents, and oracle audits. By applying TriEx to strategic games, the authors analyze explanation faithfulness, belief dynamics, and evaluator reliability, revealing mismatches between agent reasoning, beliefs, and actions. This highlights the interaction-dependent nature of explainability and the need for multi-view, evidence-grounded evaluation.

Key Contribution

LLM agents often say one thing, believe another, and do something completely different, especially when interacting with other agents.

Abstract

Explainability for Large Language Model (LLM) agents is especially challenging in interactive, partially observable settings, where decisions depend on evolving beliefs and other agents. We present \textbf{TriEx}, a tri-view explainability framework that instruments sequential decision making with aligned artifacts: (i) structured first-person self-reasoning bound to an action, (ii) explicit second-person belief states about opponents updated over time, and (iii) third-person oracle audits grounded in environment-derived reference signals. This design turns explanations from free-form narratives into evidence-anchored objects that can be compared and checked across time and perspectives. Using imperfect-information strategic games as a controlled testbed, we show that TriEx enables scalable analysis of explanation faithfulness, belief dynamics, and evaluator reliability, revealing systematic mismatches between what agents say, what they believe, and what they do. Our results highlight explainability as an interaction-dependent property and motivate multi-view, evidence-grounded evaluation for LLM agents. Code is available at https://github.com/Einsam1819/TriEx.

Interpretability & Mechanistic Interp Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

TriEx: A Game-based Tri-View Framework for Explaining Internal Reasoning in Multi-Agent LLMs

Related Papers