Search papers, labs, and topics across Lattice.
This paper investigates the sensitivity of LLM explanations to training randomness, focusing on the impact of syntactic context, learned classes, and tasks. The authors demonstrate that all three factors significantly influence the variability of explanations across different training runs. Specifically, the task being performed has the largest impact on explanation sensitivity, followed by the classes being learned, and then the syntactic context.
LLM explanations are far more sensitive to the task being performed than the context or learned classes, highlighting a critical instability in current interpretability methods.
Transformer models are now a cornerstone in natural language processing. Yet, explaining their decisions remains a challenge. It was shown recently that the same model trained on the same data with a different randomness can lead to very different explanations. In this paper, we investigate how the (syntactic) context, the classes to be learned and the tasks influence this explanations'sensitivity to randomness. We show that they all have statistically significant impact: smallest for the (syntactic) context, medium for the classes and largest for the tasks.