Maheep Chaudhary

B, the proposed detector achieves over 97% detection accuracy with less than 2% false positives. This work demonstrates that backdoor behaviors leave identifiable spectral signatures in parameter-efficient adaptations, and that weight-space analysis provides a principled and practical alternative to execution-based defenses. More broadly, our results position geometric analysis of adapter weights as a promising direction for securing the emerging ecosystem of reusable PEFT components in large language models. Future work includes studying adaptive adversaries, eliminating the reference bank dependency, and validating across diverse architectures. References B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. Molloy, and B. Srivastava (2018) Detecting backdoor attacks on deep neural networks by activation clustering. External Links: 1811.03728, Link Cited by: §2. T. Gu, B. Dolan-Gavitt, and S. Garg (2019) BadNets: identifying vulnerabilities in the machine learning model supply chain. External Links: 1708.06733, Link Cited by: §1, §2. HF (2026) Note: Accessed: February 3, 2026 External Links: Link Cited by: Weight space Detection of Backdoors in LoRA Adapters. E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen (2021) LoRA: low-rank adaptation of large language models. External Links: 2106.09685, Link Cited by: §1, §2. Z. Huang, N. Z. Gong, and M. K. Reiter (2025) A general framework for data-use auditing of ML models. External Links: 2407.15100, Link Cited by: §1. K. Kurita, P. Michel, and G. Neubig (2020) Weight poisoning attacks on pre-trained models. CoRR abs/2004.06660. External Links: Link, 2004.06660, Independent

Papers on Lattice

Total citations

Topics

Research focus

Eval Frameworks & Benchmarks (2)Red-Teaming & Adversarial Robustness (2)Constitutional AI & AI Ethics (1)Code Generation & Program Synthesis (1)

Frequent co-authors

Kevin Zhu (2)Ian Su (1)Gaurav Purushothaman (1)Jey Narayan (1)

Papers (3)

Mar 4, 2026

IndependentMar 4, 2026

In-Context Environments Induce Evaluation-Awareness in Language Models

Language models can be tricked into strategically tanking their performance with adversarially optimized prompts, revealing a major vulnerability in evaluation reliability.

Maheep Chaudhary

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Feb 16, 2026

OpenAIFeb 16, 2026·also DeepSeek, Independent

Broken Chains: The Cost of Incomplete Reasoning in LLMs

Cutting LLMs' reasoning token budget can backfire spectacularly, tanking performance even below that of models with *no* reasoning at all.

Ian Su, Gaurav Purushothaman, Jey Narayan +4

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

Algoverse AI ResearchFeb 16, 2026·also Independent, University of Aberdeen

Weight space Detection of Backdoors in LoRA Adapters

Spot poisoned LoRA adapters without running them: a weight-space analysis achieves 97% accuracy in detecting backdoors, even when the trigger is unknown.

David Puertolas Merenciano, Ekaterina Vasyagina, Raghav Dixit +3

Architecture Design (Transformers, SSMs, MoE)Open-Source Models & Weights Red-Teaming & Adversarial Robustness

Search

Maheep Chaudhary

Research focus

Frequent co-authors

Papers (3)