Mar 9, 2026arXiv:2603.08241

Sensivity of LLMs'Explanations to the Training Randomness:Context, Class&Task Dependencies

Romain Loncour, Jérémie Bogaert, Franccois-Xavier Standaert, François-Xavier Standaert

AI Summary

This paper investigates the sensitivity of LLM explanations to training randomness, focusing on the impact of syntactic context, learned classes, and tasks. The authors demonstrate that all three factors significantly influence the variability of explanations across different training runs. Specifically, the task being performed has the largest impact on explanation sensitivity, followed by the classes being learned, and then the syntactic context.

Key Contribution

LLM explanations are far more sensitive to the task being performed than the context or learned classes, highlighting a critical instability in current interpretability methods.

Abstract

Transformer models are now a cornerstone in natural language processing. Yet, explaining their decisions remains a challenge. It was shown recently that the same model trained on the same data with a different randomness can lead to very different explanations. In this paper, we investigate how the (syntactic) context, the classes to be learned and the tasks influence this explanations'sensitivity to randomness. We show that they all have statistically significant impact: smallest for the (syntactic) context, medium for the classes and largest for the tasks.

Eval Frameworks & Benchmarks Interpretability & Mechanistic Interp Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References14

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Sensivity of LLMs'Explanations to the Training Randomness:Context, Class&Task Dependencies

Related Papers