Mona T. Diab

Carnegie Mellon University

CMU Machine Learning

Papers on Lattice

Total citations

Topics

Research focus

Eval Frameworks & Benchmarks (1)Interpretability & Mechanistic Interp (1)

Frequent co-authors

Aashiq Muhamed (1)

Papers (1)

Apr 13, 2026

CMU MLApr 13, 2026

Pando: Do Interpretability Methods Work When Models Won't Explain Themselves?

Interpretability methods often fail to improve over black-box prompting when models are uncooperative, suggesting current techniques may be more about elicitation than revealing internal mechanisms.

Aashiq Muhamed, Mona T. Diab

Eval Frameworks & Benchmarks Interpretability & Mechanistic Interp

Search

Mona T. Diab

Research focus

Frequent co-authors

Papers (1)