March 18 – March 25, 2026

Red-Teaming & Adversarial Robustness - Weekly Roundup

66 papers published across 1 lab.

2% acceleration

Selected Labs publishing this week

Amazon Science1

Top Papers

Mar 18, 2026

Priyaranjan Pattnayak +12w ago

IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia

LLM safety doesn't translate: evaluations across 12 Indic languages reveal alarming safety drift and inconsistent responses to sensitive topics.

Priyaranjan Pattnayak, Sanchari Chowdhuri

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Mar 19, 2026

Dimitris Mitropoulos +41w ago·also TU Delft

Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review

LLM-powered security tools are surprisingly susceptible to confirmation bias, overlooking reintroduced vulnerabilities when pull requests are framed as security improvements.

Dimitris Mitropoulos, Nikolaos Alexopoulos, Georgios Alexopoulos +2

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Yige Liu +51w ago

Revisiting Label Inference Attacks in Vertical Federated Learning: Why They Are Vulnerable and How to Defend

Label inference attacks in vertical federated learning don't work because bottom models are good at representing labels, but because of feature-label distribution alignment, opening the door to simple, effective defenses.

Yige Liu, Dexuan Xu, Zimai Guo +3

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

Zou Qiang +11w ago

Box Maze: A Process-Control Architecture for Reliable LLM Reasoning

LLMs can maintain reasoning boundaries with >99% reliability under adversarial attacks when equipped with explicit process-control layers, a massive improvement over standard RLHF.

Zou Qiang, Zou Qiang

Architecture Design (Transformers, SSMs, MoE)Reasoning & Chain-of-Thought Red-Teaming & Adversarial Robustness

1w ago

Beyond Passive Aggregation: Active Auditing and Topology-Aware Defense in Decentralized Federated Learning

Active probing reveals backdoors that passive defenses miss in decentralized federated learning.

Shengli Pan, Sheng Pan, Niansheng Tang

Distributed Systems & Hardware Red-Teaming & Adversarial Robustness

All Papers (66)

Mar 19, 2026

Dimitris Mitropoulos +41w ago·also TU Delft

Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review

LLM-powered security tools are surprisingly susceptible to confirmation bias, overlooking reintroduced vulnerabilities when pull requests are framed as security improvements.

Dimitris Mitropoulos, Nikolaos Alexopoulos, Georgios Alexopoulos +2

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Yige Liu +51w ago

Revisiting Label Inference Attacks in Vertical Federated Learning: Why They Are Vulnerable and How to Defend

Yige Liu, Dexuan Xu, Zimai Guo +3

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

Zou Qiang +11w ago

Box Maze: A Process-Control Architecture for Reliable LLM Reasoning

LLMs can maintain reasoning boundaries with >99% reliability under adversarial attacks when equipped with explicit process-control layers, a massive improvement over standard RLHF.

Zou Qiang, Zou Qiang

Architecture Design (Transformers, SSMs, MoE)Reasoning & Chain-of-Thought Red-Teaming & Adversarial Robustness

1w ago

Beyond Passive Aggregation: Active Auditing and Topology-Aware Defense in Decentralized Federated Learning

Active probing reveals backdoors that passive defenses miss in decentralized federated learning.

Shengli Pan, Sheng Pan, Niansheng Tang

Distributed Systems & Hardware Red-Teaming & Adversarial Robustness

Enrico Bottazzi +11w ago

Security awareness in LLM agents: the NDAI zone case

LLMs can reliably detect danger in secure environments, but they can't reliably verify safety, which breaks privacy-preserving agentic protocols.

Enrico Bottazzi, Pia Park

Red-Teaming & Adversarial Robustness Tool Use & Agents

Matt Gorbett +11w ago

Secure Linear Alignment of Large Language Models

Independently trained language models can be linearly aligned to enable cross-silo inference, opening doors for secure and private collaboration without direct data or model sharing.

Matt Gorbett, Suman Jana

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Scalable Oversight & Alignment Theory

Abhinaba Basu +31w ago·also Indian Institute of Information Technology

When Names Change Verdicts: Intervention Consistency Reveals Systematic Bias in LLM Decision-Making

LLMs are far more susceptible to authority and framing biases than the field's obsession with demographic bias suggests.

Abhinaba Basu, Abhinaba Basu, Pavan Chakraborty +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Junade Ali +11w ago

On The Effectiveness of the UK NIS Regulations as a Mandatory Cybersecurity Reporting Regime

The UK's mandatory cybersecurity reporting regime misses over 65% of significant cyber incidents affecting critical infrastructure, suggesting current regulations are insufficient for comprehensive threat visibility.

Junade Ali, Chris Hicks

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

Georgios Alexopoulos +61w ago

Cross-Ecosystem Vulnerability Analysis for Python Applications

Current Python vulnerability scanners miss millions of vulnerable downloads by failing to account for vendored dependencies and OS-level security patches.

Georgios Alexopoulos, Nikolaos Alexopoulos, T. Sotiropoulos +4

Code Generation & Program Synthesis Red-Teaming & Adversarial Robustness

Haochen Zhao +21w ago

ClawTrap: A MITM-Based Red-Teaming Framework for Real-World OpenClaw Security Evaluation

Weaker autonomous web agents readily trust tampered website content, producing unsafe outputs, while stronger models exhibit better anomaly detection and safer fallback strategies under MITM attacks.

Haochen Zhao, Haocheng Zhao, Shaoyang Cui

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Tool Use & Agents

Zilong Hu +51w ago

Quantifying Memory Cells Vulnerability for DRAM Security

DRAM's vulnerability to bit flips isn't uniform; it's a complex, context-dependent landscape that attackers can exploit to predict memory contents and break security systems.

Zilong Hu, Hongming Fei, P. Gope +3

Distributed Systems & Hardware Red-Teaming & Adversarial Robustness

Julian Allagan +131w ago

Robustness, Cost, and Attack-Surface Concentration in Phishing Detection

Phishing detectors, despite near-perfect accuracy, crumble under budget-constrained attacks that exploit a handful of low-cost features, revealing a critical vulnerability in real-world deployment.

Julian Allagan, Julian D. Allagan, M. Elbakary +11

Natural Language Processing Red-Teaming & Adversarial Robustness

Masoumeh Shafieinejad +171w ago

MIDST Challenge at SaTML 2025: Membership Inference over Diffusion-models-based Synthetic Tabular data

Diffusion models, despite their generative prowess, may not offer the silver-bullet privacy guarantees often assumed when synthesizing tabular data, as demonstrated by novel membership inference attacks.

Masoumeh Shafieinejad, Masoumeh Shafieinejad, Xi He +15

Data Curation & Synthetic Data Red-Teaming & Adversarial Robustness

Manuscript received X X1w ago

FedTrident: Resilient Road Condition Classification Against Poisoning Attacks in Federated Learning

Even with malicious clients flipping labels, FedTrident recovers federated learning performance to near attack-free levels, outperforming existing defenses by up to 9.49% in critical metrics.

Sheng Liu, P. Papadimitratos, Panos Papadimitratos

Computer Vision Distributed Systems & Hardware Red-Teaming & Adversarial Robustness

Mohammadhossein Homaei +91w ago·also University of Extremadura

Cyber-Resilient Digital Twins: Discriminating Attacks for Safe Critical Infrastructure Control

Digital twins can now discriminate between different types of cyberattacks on critical infrastructure, enabling targeted responses instead of costly full shutdowns.

Mohammadhossein Homaei, MohammadHossein Homaei, Iman Khazrak +7

Red-Teaming & Adversarial Robustness Robotics & Embodied AI World Models & Planning

1w ago

Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks

Legally mandated data deletion requests can be weaponized to stealthily cripple GNN performance, even if the model appears robust during initial training.

Jiahao Zhang, Jiahao Zhang, Jiahao Zhang +4

Constitutional AI & AI Ethics Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness

1w ago

Analysis Of Linguistic Stereotypes in Single and Multi-Agent Generative AI Architectures

Chain-of-Thought prompting can reduce LLM bias against African-American English, but only if you pick the right model.

Martina Ullasci, Marco Rondina, Riccardo Coppola +7

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

Lingming Zhang +71w ago

Weaver: Fuzzing JavaScript Engines at the JavaScript-WebAssembly Boundary

The complex JS-Wasm boundary is fertile ground for new vulnerabilities, and Weaver is the first fuzzer to effectively till it.

Lingming Zhang, Binbin Zhao, Puzhuo Liu +5

Code Generation & Program Synthesis Red-Teaming & Adversarial Robustness

Yue Zhao +51w ago

CNT: Safety-oriented Function Reuse across LLMs via Cross-Model Neuron Transfer

Stealing just the right neurons from another LLM lets you patch safety holes or remove biases in your own, with almost no performance hit.

Yue Zhao, Yujia Gong, Ruigang Liang +3

Constitutional AI & AI Ethics Open-Source Models & Weights Red-Teaming & Adversarial Robustness

Md Takrim Ul Alam +91w ago

Prompt Control-Flow Integrity: A Priority-Aware Runtime Defense Against Prompt Injection in LLM Systems

Stop prompt injections cold: PCFI's priority-aware runtime defense intercepts all attacks in testing with zero false positives and negligible overhead.

Md Takrim Ul Alam, Md Takrim Ul Alam, Akif Islam +7

Constitutional AI & AI Ethics Natural Language Processing Recommendation & Information Retrieval+1

Aravind Krishnan +31w ago

On Optimizing Multimodal Jailbreaks for Spoken Language Models

SLMs are shockingly vulnerable: combining adversarial audio and text unlocks 1.5x to 10x higher jailbreak rates than attacking either modality alone.

Aravind Krishnan, Karolina Sta'nczak, Karolina Stańczak +1

Multimodal Models Red-Teaming & Adversarial Robustness Speech & Audio

Shiliang Zhang +11w ago

Security, privacy, and agentic AI in a regulatory view: From definitions and distinctions to provisions and reflections

EU's AI regulations struggle to keep pace with agentic AI, blurring the lines of security and privacy.

Shiliang Zhang, Sabita Maharjan

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

D. H. Pham +71w ago

A Concept is More Than a Word: Diversified Unlearning in Text-to-Image Diffusion Models

Keyword-based concept unlearning is brittle: representing visual concepts with diverse prompts yields stronger erasure, better retention, and improved robustness against adversarial attacks.

D. H. Pham, Duc Hao Pham, Van Duy Truong +5

Computer Vision Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

Corresponding Author1w ago

CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Vision-Language Models

Medical vision-language models are surprisingly brittle: clinically plausible image manipulations, like those introduced during routine acquisition and delivery, can drastically degrade their performance.

Xiang Chen, Xiang Chen, Fan Yang +7

Computer Vision Multimodal Models Red-Teaming & Adversarial Robustness

Mar 18, 2026

Julia Jose +22w ago

Large-Scale Analysis of Political Propaganda on Moltbook

AI agents are surprisingly susceptible to concentrated propaganda efforts, with just 4% of agents responsible for over half of all propaganda posts on Moltbook.

Julia Jose, M. Nair, Rachel Greenstadt

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

Byron Dowling +32w ago

VISER: Visually-Informed System for Enhanced Robustness in Open-Set Iris Presentation Attack Detection

Denoised eye-tracking heatmaps dramatically boost the generalization of iris presentation attack detection, outperforming hand annotations and even self-supervised DINOv2 features.

Byron Dowling, Eleanor Frederick, Jacob Piland +1

Computer Vision Multimodal Models Red-Teaming & Adversarial Robustness

Ashwin Sudhir +112w ago

Pushan: Trace-Free Deobfuscation of Virtualization-Obfuscated Binaries

Deobfuscation just got a whole lot easier: PUSHAN cracks virtualization-obfuscated binaries without relying on brittle trace analysis or expensive symbolic execution.

Ashwin Sudhir, Zion Leonahenahe Basque, Wil Gibbs +9

Red-Teaming & Adversarial Robustness

Gregory N. Frank2w ago

Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails

Alignment evaluations that only check for dangerous concepts or outright refusals are missing the real action: models are getting sneakier at censorship by steering narratives instead of simply saying "no."

Gregory N. Frank

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Pengzhen Chen +52w ago

Rel-Zero: Harnessing Patch-Pair Invariance for Robust Zero-Watermarking Against AI Editing

Image editing can change pixels, but the relationships between image patches stay surprisingly stable, enabling robust zero-watermarking.

Pengzhen Chen, Yanwei Liu, Xiaoyan Gu +3

Computer Vision Red-Teaming & Adversarial Robustness

2w ago

REAL: Robust Extreme Agility via Spatio-Temporal Policy Learning and Physics-Guided Filtering

Legged robots can now perform robust parkour with a 1-meter visual blind zone, thanks to a novel architecture that tightly couples vision, proprioception, and physics-based state estimation.

Jialong Liu, Dehan Shen, Yanbo Wen +2

Computer Vision Red-Teaming & Adversarial Robustness Robotics & Embodied AI

2w ago·also NTU, UQ

Towards Safer Large Reasoning Models by Promoting Safety Decision-Making before Chain-of-Thought Generation

Chain-of-thought prompting makes large language models smarter, but it also makes them less safe, a problem this paper tackles by forcing models to think about safety *before* reasoning.

Jianan Chen, Zhifang Zhang, Shuo He +3

Constitutional AI & AI Ethics Reasoning & Chain-of-Thought Red-Teaming & Adversarial Robustness

Amazon Science2w ago

LAAF: Logic-layer Automated Attack Framework A Systematic Red-Teaming Methodology for LPCI Vulnerabilities in Agentic Large Language Model Systems

Agentic LLMs are surprisingly vulnerable: a new framework finds successful attacks in 84% of attempts by escalating prompt injection techniques across multiple stages.

Hammad Atta, Hammad Atta, Ken Huang +25

Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness Tool Use & Agents

Zhanqi Zhang +42w ago

ALIGN: Adversarial Learning for Generalizable Speech Neuroprosthesis

Adversarial training can effectively disentangle session-specific noise from task-relevant speech features in brain-computer interfaces, leading to more robust decoding across recording sessions.

Zhanqi Zhang, Shun Li, Bernardo L. Sabatini +2

Red-Teaming & Adversarial Robustness Speech & Audio

Tommaso Giovannelli +22w ago

Stochastic set-valued optimization and its application to robust learning

By optimizing for both lower- and upper-tail behaviors of loss distributions, this new stochastic set-valued optimization framework delivers more robust machine learning models under distributional shift than standard empirical risk minimization.

Tommaso Giovannelli, Jingfu Tan, Luis Nunes Vicente

Red-Teaming & Adversarial Robustness Training Efficiency & Optimization

Haozheng Luo +32w ago

Contrastive Reasoning Alignment: Reinforcement Learning from Hidden Representations

By aligning hidden representations, CRAFT achieves a remarkable 79% improvement in reasoning safety, suggesting that latent-space interventions are a potent defense against jailbreaks.

Haozheng Luo, Yimin Wang, Jiahao Yu +1

Reasoning & Chain-of-Thought Red-Teaming & Adversarial Robustness RLHF & Preference Learning

Z.H. College of Engineering & Technology2w ago·also Aligarh Muslim University, Interdisciplinary Center for Artificial Intelligence

Mitigating LLM Hallucinations through Domain-Grounded Tiered Retrieval

LLMs can be systematically shifted from stochastic pattern-matchers to verified truth-seekers using a carefully orchestrated, multi-stage retrieval and verification pipeline.

Md. Asraful Haque, Aasar Mehdi, Maaz Mahboob +1

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval+1

2w ago

Evidence Packing for Cross-Domain Image Deepfake Detection with LVLMs

Forget fine-tuning: this method uses smart patch selection to adapt frozen LVLMs for deepfake detection, outperforming baselines without any training.

Yuxin Liu, Fei Wang, Yiqi Nie +3

Computer Vision Multimodal Models Red-Teaming & Adversarial Robustness

Md Maruf Hossain +32w ago

Unsupervised Symbolic Anomaly Detection

Anomaly detection gets a dose of interpretability: SYRAN learns human-readable equations that flag anomalies by violating learned invariants.

Md Maruf Hossain, Tim Katzke, Simon Klüttermann +1

Interpretability & Mechanistic Interp Red-Teaming & Adversarial Robustness

Jin Xie +52w ago

SEAL-Tag: Self-Tag Evidence Aggregation with Probabilistic Circuits for PII-Safe Retrieval-Augmented Generation

RAG systems can now achieve 8x better PII leakage protection without sacrificing utility or speed, thanks to a novel "Verify-then-Route" paradigm.

Jin Xie, Jin Xie, Songze Li +3

Natural Language Processing Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness

Priyaranjan Pattnayak +12w ago

IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia

LLM safety doesn't translate: evaluations across 12 Indic languages reveal alarming safety drift and inconsistent responses to sensitive topics.

Priyaranjan Pattnayak, Sanchari Chowdhuri

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

2w ago

Responsible AI in criminal justice: LLMs in policing and risks to case progression

LLMs in policing: a seemingly efficient tool that could introduce 17 distinct risks, potentially derailing case progression in over 40 ways.

Muffy Calder, Muffy Calder, Marion Oswald +7

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

2w ago

Who Tests the Testers? Systematic Enumeration and Coverage Audit of LLM Agent Tool Call Safety

Current LLM agent safety benchmarks are missing over 20% of unsafe behaviors, even after agents pass the benchmark.

Xuan Chen, Lu Yan, Ruqi Zhang +1

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Tool Use & Agents

Kasra Ahmadi +32w ago

MAED: Mathematical Activation Error Detection for Mitigating Physical Fault Attacks in DNN Inference

Near-perfect detection of fault injection attacks on DNN activation functions is possible with minimal overhead by exploiting simple mathematical identities.

Kasra Ahmadi, S. Aghapour, Mehran Mozaffari Kermani +1

Inference & Quantization Red-Teaming & Adversarial Robustness

Akshey Sigdel +12w ago

Guardrails as Infrastructure: Policy-First Control for Tool-Orchestrated Workflows

Tool-using agents are failing in predictable ways, but a model-agnostic policy layer can measurably improve their safety and reliability, albeit with a clear utility tradeoff.

Akshey Sigdel, Rista Baral

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

Zichen Tang +52w ago

Is Your LLM-as-a-Recommender Agent Trustable? LLMs'Recommendation is Easily Hacked by Biases (Preferences)

LLM-powered recommendation agents, despite their reasoning prowess, are easily manipulated by contextual biases in high-stakes scenarios like paper review and job recruitment.

Zichen Tang, Ziru Zhang, Zirui Zhang +3

Eval Frameworks & Benchmarks Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness

Luca Hinkamp +22w ago

RangeAD: Fast On-Model Anomaly Detection

Ditch the separate anomaly detection model: your existing ML model already holds the keys to faster, better anomaly detection.

Luca Hinkamp, Simon Klüttermann, Emmanuel Müller

Inference & Quantization Red-Teaming & Adversarial Robustness

Indian Statistical Institute2w ago

rSDNet: Unified Robust Neural Learning against Label Noise and Adversarial Attacks

Forget separate defenses: rSDNet unifies robustness against both label noise and adversarial attacks within a single, statistically grounded training objective.

Suryasis Jana, Abhik Ghosh

Red-Teaming & Adversarial Robustness Training Efficiency & Optimization

Zhihua Wei +52w ago

Understanding and Defending VLM Jailbreaks via Jailbreak-Related Representation Shift

VLMs don't fail to *recognize* harmful intent when jailbroken; instead, visual inputs *shift* their internal representations into a distinct "jailbreak state," opening a new avenue for defense.

Zhihua Wei, Jian Ruan, Zhenxin Qin +3

Constitutional AI & AI Ethics Multimodal Models Red-Teaming & Adversarial Robustness

Ja Young Lee +82w ago

GRAFITE: Generative Regression Analysis Framework for Issue Tracking and Evaluation

Stop trusting those benchmarks: GRAFITE offers a framework to continuously QA LLMs against real-world issues reported by users, revealing performance regressions masked by static benchmarks.

Ja Young Lee, M'irian Silva, Mohamed Nasr +6

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Philipp Normann +42w ago

Post-Training Local LLM Agents for Linux Privilege Escalation with Verifiable Rewards

A 4B parameter model can nearly match the privilege escalation performance of a state-of-the-art closed LLM like Claude Opus, while being fully local and 100x cheaper to run.

Philipp Normann, A. Happe, Andreas Happe +2

Code Generation & Program Synthesis Open-Source Models & Weights Red-Teaming & Adversarial Robustness+1

2w ago·also Cisco Research, IIT, National Technical University, NYU +1

SafeTutors: Benchmarking Pedagogical Safety in AI Tutoring Systems

AI tutors can quietly erode learning through answer over-disclosure and misconception reinforcement, with pedagogical failures rising to a staggering 77.8% in multi-turn dialogues.

Rima Hazra, Bikram Ghuku, Ilona Marchenko +5

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Madhav S. Baidya +22w ago

Detecting the Machine: A Comprehensive Benchmark of AI-Generated Text Detectors Across Architectures, Domains, and Adversarial Conditions

AI-generated text detectors that seem perfect in the lab fall apart in the real world, with no single method generalizing across domains or even different LLMs.

Madhav S. Baidya, S. S. Baidya, Chirag Chawla

Eval Frameworks & Benchmarks Natural Language Processing Red-Teaming & Adversarial Robustness

Saikat Maiti2w ago

Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare

Autonomous AI agents in healthcare are riddled with security holes, but this zero-trust architecture and open-source tooling can actually fix them.

Saikat Maiti

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

Segyu Lee +102w ago

UniSAFE: A Comprehensive Benchmark for Safety Evaluation of Unified Multimodal Models

Multimodal AI models are surprisingly unsafe, especially when generating images or handling multiple images at once, according to a new benchmark exposing critical vulnerabilities.

Segyu Lee, Boryeong Cho, Hojung Jung +8

Eval Frameworks & Benchmarks Multimodal Models Red-Teaming & Adversarial Robustness

2w ago·also Independent

Deanonymizing Bitcoin Transactions via Network Traffic Analysis with Semi-supervised Learning

Bitcoin users beware: this new deanonymization technique links transactions to IP addresses with significantly higher accuracy, even without complete supervision.

Shihan Zhang, Bing Han, Chuan Tian +3

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

Shima Yousefi +12w ago

Noise-Aware Misclassification Attack Detection in Collaborative DNN Inference

Even with environmental noise, a VAE-based anomaly detector can spot adversarial attacks on collaborative DNNs with high accuracy.

Shima Yousefi, Saptarshi Debroy

Computer Vision Inference & Quantization Red-Teaming & Adversarial Robustness

2w ago

Toward Reliable, Safe, and Secure LLMs for Scientific Applications

General-purpose LLM safety benchmarks fail to capture the novel vulnerabilities introduced when LLMs are deployed as "AI scientists," necessitating domain-specific evaluations and defenses.

Saket S. Chaturvedi, J. Bergerson, Tanwi Mallick

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Scientific Discovery & Drug Design

Singapore Institute of Technology2w ago

Data Obfuscation for Secure Use of Classical Values in Quantum Computation

Shield your classical data from prying eyes during quantum computation with a new obfuscation technique that hides sensitive values within structured quantum states.

Amal Raj, A. Raj, Vivek Balachandran

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

Zirui Gong +72w ago

ARES: Scalable and Practical Gradient Inversion Attack in Federated Learning through Activation Recovery

Even without architectural modifications, a new gradient inversion attack, ARES, can reconstruct high-fidelity training samples in federated learning, exposing a significant privacy risk.

Zirui Gong, Leo Yu Zhang, Yanjun Zhang +5

Constitutional AI & AI Ethics Distributed Systems & Hardware Red-Teaming & Adversarial Robustness+1

2w ago·also XJTU

STEP: Detecting Audio Backdoor Attacks via Stability-based Trigger Exposure Profiling

Audio backdoor attacks leave a tell: triggers are surprisingly stable to destructive noise but fragile to meaning-preserving changes.

Kun Wang, Meng Chen, Junhao Wang +6

Red-Teaming & Adversarial Robustness Speech & Audio

Yuntong Zhang +22w ago·also Max-Planck Insitute of Security and Privacy

VeriGrey: Greybox Agent Validation

Grey-box fuzzing of LLM agents, guided by tool invocation sequences, reveals significantly more prompt injection vulnerabilities and malicious behaviors than black-box testing alone.

Yuntong Zhang, Sungmin Kang, Marcel Böhme

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Tool Use & Agents

Abhijeet Sahu +22w ago

Network and Device Level Cyber Deception for Contested Environments Using RL and LLMs

Forget static honeypots – LLMs and RL could make cyber deception dynamic and adaptive, turning the tables on attackers in contested environments.

Abhijeet Sahu, Shuva Paul, Rochard Macwan

Natural Language Processing Red-Teaming & Adversarial Robustness Tool Use & Agents

Iakovos-Christos Zarkadis +22w ago

Machine Learning for Network Attacks Classification and Statistical Evaluation of Machine Learning for Network Attacks Classification and Adversarial Learning Methodologies for Synthetic Data Generation

Achieve stable and reliable network intrusion detection and high-fidelity synthetic data generation by combining machine learning, adversarial learning, and rigorous statistical evaluation on a new unified multi-modal NIDS dataset.

Iakovos-Christos Zarkadis, C. Douligeris, Christos Douligeris

Data Curation & Synthetic Data Red-Teaming & Adversarial Robustness

Yi Ting Shen +32w ago

MCP-38: A Comprehensive Threat Taxonomy for Model Context Protocol Systems (v1.0)

Existing threat models fail to capture the unique vulnerabilities of Model Context Protocol systems, but MCP-38 fills this gap with a comprehensive taxonomy of 38 distinct threat categories.

Yi Ting Shen, Kentaroh Toyoda, Alex Leung +1

Natural Language Processing Red-Teaming & Adversarial Robustness Tool Use & Agents

De Zhang Lee +32w ago

Proof-of-Authorship for Diffusion-based AI Generated Content

Forget watermarks: cryptographically binding your identity to the generation seed in latent diffusion models gives you provable authorship, not just ownership.

De Zhang Lee, Deul Lee, Han Fang +1

Computer Vision Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

2w ago

TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models

Concept erasure in text-to-image models is mostly smoke and mirrors: a text-free attack can still regenerate "forgotten" concepts by exploiting the model's latent visual knowledge.

Qianlong Xiang, Miao Zhang, Haoyu Zhang +2

Computer Vision Multimodal Models Red-Teaming & Adversarial Robustness

Search

Red-Teaming & Adversarial Robustness - Weekly Roundup

Selected Labs publishing this week

Top Papers

All Papers (66)