Evangelos E. Papalexakis

Research focus

Interpretability & Mechanistic Interp (2)Architecture Design (Transformers, SSMs, MoE) (1)Natural Language Processing (1)Training Efficiency & Optimization (1)

Frequent co-authors

Het Patel (1)Tiejin Chen (1)Hua Wei (1)Yunshu Wu (1)

Papers (3)

Apr 21, 2026

1w ago

Are LLM Uncertainty and Correctness Encoded by the Same Features? A Functional Dissociation via Sparse Autoencoders

LLMs have "pure incorrectness" features that correlate with wrong answers but don't actually *cause* them, suggesting that simply identifying error-correlated activations isn't enough for effective intervention.

Het Patel, Tiejin Chen, Hua Wei +1

Interpretability & Mechanistic Interp

Feb 18, 2026

Feb 18, 2026·also UCR

Discrete Stochastic Localization for Non-autoregressive Generation

Train smarter, not harder: DSL unlocks 4x faster non-autoregressive generation by teaching masked diffusion models to self-correct more efficiently.

Yunshu Wu, Yunshu Wu, Jiayi Cheng +7

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Feb 12, 2026

Feb 12, 2026·also UC Riverside

Jailbreaking Leaves a Trace: Understanding and Detecting Jailbreak Attacks from Internal Representations of Large Language Models

LLMs betray their jailbreaking susceptibility in their hidden activations, allowing for lightweight detection and even real-time disruption of attacks.

Sri Durga Sai Sowmya Kadali, Evangelos E. Papalexakis

Interpretability & Mechanistic Interp Red-Teaming & Adversarial Robustness

Search

Evangelos E. Papalexakis

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (3)