March 18 – March 25, 2026

Constitutional AI & AI Ethics - Weekly Roundup

67 papers published across 1 lab.

1% acceleration

Selected Labs publishing this week

Stanford HAI1

Top Papers

Mar 18, 2026

Priyaranjan Pattnayak +12w ago

IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia

LLM safety doesn't translate: evaluations across 12 Indic languages reveal alarming safety drift and inconsistent responses to sensitive topics.

Priyaranjan Pattnayak, Sanchari Chowdhuri

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Mar 19, 2026

Huansheng Ning +11w ago

An Onto-Relational-Sophic Framework for Governing Synthetic Minds

The Onto-Relational-Sophic framework offers a comprehensive philosophical foundation for governing synthetic minds, moving beyond tool-centric regulatory paradigms.

Huansheng Ning, Jianguo Ding

Constitutional AI & AI Ethics Scalable Oversight & Alignment Theory

Chen Yaoling +41w ago

When Differential Privacy Meets Wireless Federated Learning: An Improved Analysis for Privacy and Convergence

DPWFL privacy doesn't have to diverge: this work proves it can converge to a constant even with non-convex objectives and gradient clipping.

Chen Yaoling, Yaolin Chen, Liang Hao +2

Constitutional AI & AI Ethics Distributed Systems & Hardware Training Efficiency & Optimization

Marcela Palejova1w ago

Authority-Level Priors: An Under-Specified Constraint in Hierarchical Predictive Processing

Why does explicit belief updating often fail to change your stress response? Authority-Level Priors (ALPs) may be the answer.

Marcela Palejova

Constitutional AI & AI Ethics Scalable Oversight & Alignment Theory

Yige Liu +51w ago

Revisiting Label Inference Attacks in Vertical Federated Learning: Why They Are Vulnerable and How to Defend

Label inference attacks in vertical federated learning don't work because bottom models are good at representing labels, but because of feature-label distribution alignment, opening the door to simple, effective defenses.

Yige Liu, Dexuan Xu, Zimai Guo +3

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

All Papers (67)

Mar 19, 2026

Huansheng Ning +11w ago

An Onto-Relational-Sophic Framework for Governing Synthetic Minds

The Onto-Relational-Sophic framework offers a comprehensive philosophical foundation for governing synthetic minds, moving beyond tool-centric regulatory paradigms.

Huansheng Ning, Jianguo Ding

Constitutional AI & AI Ethics Scalable Oversight & Alignment Theory

Chen Yaoling +41w ago

When Differential Privacy Meets Wireless Federated Learning: An Improved Analysis for Privacy and Convergence

DPWFL privacy doesn't have to diverge: this work proves it can converge to a constant even with non-convex objectives and gradient clipping.

Chen Yaoling, Yaolin Chen, Liang Hao +2

Constitutional AI & AI Ethics Distributed Systems & Hardware Training Efficiency & Optimization

Marcela Palejova1w ago

Authority-Level Priors: An Under-Specified Constraint in Hierarchical Predictive Processing

Why does explicit belief updating often fail to change your stress response? Authority-Level Priors (ALPs) may be the answer.

Marcela Palejova

Constitutional AI & AI Ethics Scalable Oversight & Alignment Theory

Yige Liu +51w ago

Revisiting Label Inference Attacks in Vertical Federated Learning: Why They Are Vulnerable and How to Defend

Yige Liu, Dexuan Xu, Zimai Guo +3

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

Zikang Ding +61w ago

UGID: Unified Graph Isomorphism for Debiasing Large Language Models

By enforcing graph isomorphism across counterfactual inputs, UGID reveals that debiasing LLMs can be achieved by directly manipulating internal representations and attention mechanisms.

Zikang Ding, Junchi Yao, Junhao Li +4

Architecture Design (Transformers, SSMs, MoE)Constitutional AI & AI Ethics Natural Language Processing

1w ago·also Jagannath University

Unmasking Algorithmic Bias in Predictive Policing: A GAN-Based Simulation Framework with Multi-City Temporal Analysis

Predictive policing algorithms can exhibit extreme racial bias, with one city showing a 157x higher detection rate for one racial group in a single year.

Pronob Kumar Barman, P. K. Barman, Pronoy Kumar Barman

Constitutional AI & AI Ethics Data Curation & Synthetic Data World Models & Planning

Matt Gorbett +11w ago

Secure Linear Alignment of Large Language Models

Independently trained language models can be linearly aligned to enable cross-silo inference, opening doors for secure and private collaboration without direct data or model sharing.

Matt Gorbett, Suman Jana

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Scalable Oversight & Alignment Theory

Yipu Dou +11w ago

MOSAIC: Multi-Objective Slice-Aware Iterative Curation for Alignment

Forget random data mixing: MOSAIC uses failure analysis to intelligently curate training data, leading to better safety, less over-refusal, and improved instruction following, all at once.

Yipu Dou, Wang Yang

Constitutional AI & AI Ethics Data Curation & Synthetic Data RLHF & Preference Learning

Indian Institute of Information Technology1w ago

When Names Change Verdicts: Intervention Consistency Reveals Systematic Bias in LLM Decision-Making

LLMs are far more susceptible to authority and framing biases than the field's obsession with demographic bias suggests.

Abhinaba Basu, Abhinaba Basu, Pavan Chakraborty +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Junade Ali +11w ago

On The Effectiveness of the UK NIS Regulations as a Mandatory Cybersecurity Reporting Regime

The UK's mandatory cybersecurity reporting regime misses over 65% of significant cyber incidents affecting critical infrastructure, suggesting current regulations are insufficient for comprehensive threat visibility.

Junade Ali, Chris Hicks

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

Masayuki Kawarada +21w ago

GAIN: A Benchmark for Goal-Aligned Decision-Making of Large Language Models under Imperfect Norms

LLMs surprisingly prioritize norm adherence over personal incentives in business scenarios, challenging assumptions about goal-driven behavior.

Masayuki Kawarada, Kodai Watanabe, Soichiro Murakami

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks RLHF & Preference Learning

Quentin Guimard +51w ago

SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models

Unlocking fairer vision-language models may be as simple as intervening in the sparse latent space of a sparse autoencoder, enabling targeted bias removal without harming performance.

Quentin Guimard, Federico Bartsch, Simone Caldarella +3

Computer Vision Constitutional AI & AI Ethics Multimodal Models

Zhouting Zhao +11w ago

A Model Ensemble-Based Post-Processing Framework for Fairness-Aware Prediction

Achieve fairness without sacrificing accuracy: this post-processing ensemble method boosts fairness across diverse tasks and models.

Zhouting Zhao, Tin Lok James Ng

Constitutional AI & AI Ethics Natural Language Processing

Arthur Dyevre +11w ago

Man and machine: artificial intelligence and judicial decision making

Despite the hype, AI decision aids have had surprisingly little impact on actual judicial decisions, revealing a critical gap between algorithmic potential and real-world application.

Arthur Dyevre, Ahmad Shahvaroughi

Constitutional AI & AI Ethics Natural Language Processing

Vedanta S P +31w ago

I Can't Believe It's Corrupt: Evaluating Corruption in Multi-Agent Governance Systems

Forget scaling laws: the *structure* of your AI governance system matters more than the specific LLM when it comes to preventing corruption.

Vedanta S P, P. VedantaS, P. Kumaraguru +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Tool Use & Agents

Bauhaus University1w ago·also Stanford HAI

Through the Looking-Glass: AI-Mediated Video Communication Reduces Interpersonal Trust and Confidence in Judgments

AI-mediated video calls erode trust and confidence, even though they don't actually make people worse at spotting lies.

Nelson Navajas Fernández, Jeffrey T. Hancock, Maurice Jakesch

Constitutional AI & AI Ethics Natural Language Processing

Kenneth Joseph +21w ago

A conceptual framework for ideology beyond the left and right

Stop shoehorning ideology into a left/right box: this framework lets you model complex belief systems as interconnected networks of concepts, revealing hidden relationships in social discourse.

Kenneth Joseph, Kim Williams, David Lazer

Constitutional AI & AI Ethics Natural Language Processing

Harshvardhan J. Pandit +41w ago

Terms of (Ab)Use: An Analysis of GenAI Services

GenAI terms of service make you solely responsible for your AI's outputs, even though you have no control over how the model works.

Harshvardhan J. Pandit, H. Pandit, Dick A. H. Blankvoort +2

Constitutional AI & AI Ethics Natural Language Processing

Wenxiu Li +31w ago

The Spillover Effects of Peer AI Rinsing on Corporate Green Innovation

AI washing isn't just a marketing problem; it actively harms corporate green innovation, especially for smaller players in competitive markets.

Wenxiu Li, Zhanjie Wen, Jiechang Xia +1

Constitutional AI & AI Ethics Natural Language Processing

Nicholas D'Silva +21w ago

SoK: Practical Aspects of Releasing Differentially Private Graphs

Navigating the maze of differentially private graph release methods just got easier: a new framework helps practitioners choose the right approach, avoid common pitfalls, and make sound evaluations.

Nicholas D'Silva, Surya Nepal, Salil S. Kanhere

Constitutional AI & AI Ethics Natural Language Processing Open-Source Models & Weights

Corneille Niyonkuru +41w ago

Balancing Performance and Fairness in Explainable AI for Anomaly Detection in Distributed Power Plants Monitoring

You *can* have it all: high-performance anomaly detection, interpretability, and fairness, even in highly imbalanced industrial datasets.

Corneille Niyonkuru, M. Atemkeng, Marcellin Atemkeng +2

Constitutional AI & AI Ethics Interpretability & Mechanistic Interp

Christian Di Maio +71w ago

Book your room in the Turing Hotel! A symmetric and distributed Turing Test with multiple AIs and humans

LLMs in a group Turing Test still make tell-tale mistakes that betray their AI origins, even when their language skills are otherwise convincing.

Christian Di Maio, Tommaso Guidi, L. Quarantiello +5

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

1w ago

From Accuracy to Readiness: Metrics and Benchmarks for Human-AI Decision-Making

Human-AI teams often fail not because AI is inaccurate, but because humans miscalibrate their reliance on it, highlighting the need for readiness metrics beyond accuracy.

Min Hun Lee

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks RLHF & Preference Learning

1w ago

Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks

Legally mandated data deletion requests can be weaponized to stealthily cripple GNN performance, even if the model appears robust during initial training.

Jiahao Zhang, Jiahao Zhang, Jiahao Zhang +4

Constitutional AI & AI Ethics Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness

Eduardo Di Santi1w ago

Cognitive Amplification vs Cognitive Delegation in Human-AI Systems: A Metric Framework

Blindly maximizing human-AI performance can degrade human expertise over time, revealing a critical trade-off that demands a new approach to system design.

Eduardo Di Santi

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Tool Use & Agents

1w ago

Analysis Of Linguistic Stereotypes in Single and Multi-Agent Generative AI Architectures

Chain-of-Thought prompting can reduce LLM bias against African-American English, but only if you pick the right model.

Martina Ullasci, Marco Rondina, Riccardo Coppola +7

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

Rudra Jadhav +21w ago

Implicit Grading Bias in Large Language Models: How Writing Style Affects Automated Assessment Across Math, Programming, and Essay Tasks

LLMs penalize informal language in essays so severely that it's like marking a B+ down to a C+, even when explicitly told to ignore writing style.

Rudra Jadhav, Janhavi Danve, Sonalika Shaw

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Maria Milkova +21w ago

Detecting Basic Values in A Noisy Russian Social Media Text Data: A Multi-Stage Classification Framework

LLMs, when used to annotate social media for human values, systematically overestimate "Openness to Change" compared to human experts, revealing a potential bias in automated value detection.

Maria Milkova, Maria A. Milkova, Maksim Rudnev

Constitutional AI & AI Ethics Data Curation & Synthetic Data Natural Language Processing

1w ago

Follow the Rules (or Not): Community Norms and AI-Generated Support in Online Health Communities

AI's attempts to provide support in online health communities can backfire by inappropriately conforming to, or outright violating, established community norms.

Shravika Mittal, Erin Kasson, E. Kasson +6

Constitutional AI & AI Ethics Natural Language Processing

Wenxiu Li +71w ago

The Impact of Corporate AI Washing on Farmers'Digital Financial Behavior Response -- An Analysis from the Perspective of Digital Financial Exclusion

Overstating AI capabilities in fintech erodes trust and hinders digital financial inclusion among farmers, particularly those lacking strong social networks.

Wenxiu Li, Li Wenxiu, Zhanjie Wen +5

Constitutional AI & AI Ethics Natural Language Processing

Yue Zhao +51w ago

CNT: Safety-oriented Function Reuse across LLMs via Cross-Model Neuron Transfer

Stealing just the right neurons from another LLM lets you patch safety holes or remove biases in your own, with almost no performance hit.

Yue Zhao, Yujia Gong, Ruigang Liang +3

Constitutional AI & AI Ethics Open-Source Models & Weights Red-Teaming & Adversarial Robustness

Md Takrim Ul Alam +91w ago

Prompt Control-Flow Integrity: A Priority-Aware Runtime Defense Against Prompt Injection in LLM Systems

Stop prompt injections cold: PCFI's priority-aware runtime defense intercepts all attacks in testing with zero false positives and negligible overhead.

Md Takrim Ul Alam, Md Takrim Ul Alam, Akif Islam +7

Constitutional AI & AI Ethics Natural Language Processing Recommendation & Information Retrieval+1

Carlos Hinojosa +31w ago

SAVeS: Steering Safety Judgments in Vision-Language Models via Semantic Cues

VLMs' safety judgments are easily manipulated by simple semantic cues, revealing a reliance on superficial associations rather than true visual understanding.

Carlos Hinojosa, Clemens Grange, Bernard Ghanem +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Multimodal Models

Kevin Baum +11w ago

Constitutive vs. Corrective: A Causal Taxonomy of Human Runtime Involvement in AI Systems

The crucial difference between "Human-in-the-Loop" and "Human-on-the-Loop" isn't *where* the human is, but *how* their involvement causally shapes the AI's decisions.

Kevin Baum, Johann Laux

Constitutional AI & AI Ethics Natural Language Processing Scalable Oversight & Alignment Theory

Tudor-Dan Mihoc +41w ago·also Babes Bolyai University

Student Views in AI Ethics and Social Impact

Men and women see AI's impact very differently, with implications for how we teach ethics to future AI developers.

Tudor-Dan Mihoc, T. Mihoc, M. Petrescu +2

Constitutional AI & AI Ethics Natural Language Processing

Shiliang Zhang +11w ago

Security, privacy, and agentic AI in a regulatory view: From definitions and distinctions to provisions and reflections

EU's AI regulations struggle to keep pace with agentic AI, blurring the lines of security and privacy.

Shiliang Zhang, Sabita Maharjan

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

Marcelo Fernandez1w ago

Agent Control Protocol: Admission Control for Agent Actions

Guaranteeing secure and compliant agent behavior in B2B environments may finally be within reach thanks to a new cryptographic admission control protocol.

Marcelo Fernandez

Constitutional AI & AI Ethics Tool Use & Agents

Duc Hao Pham +71w ago

A Concept is More Than a Word: Diversified Unlearning in Text-to-Image Diffusion Models

Keyword-based concept unlearning is brittle: representing visual concepts with diverse prompts yields stronger erasure, better retention, and improved robustness against adversarial attacks.

Duc Hao Pham, D. H. Pham, Van Duy Truong +5

Computer Vision Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

Mar 18, 2026

José Palazzo Moreira de Oliveira2w ago

From Symbol to Meaning: Ontological and Philosophical Reflections on Large Language Models in Information Systems Engineering

LLMs aren't just better tools; they're forcing us to rethink the very nature of information, knowledge, and meaning in system design.

José Palazzo Moreira de Oliveira

Architecture Design (Transformers, SSMs, MoE)Constitutional AI & AI Ethics Natural Language Processing

2w ago

Harm or Humor: A Multimodal, Multilingual Benchmark for Overt and Covert Harmful Humor

Current AI safety filters can't tell a joke from a threat, especially when humor relies on cultural context – this new benchmark exposes that blind spot.

Ahmed Sharshar, Hosam Elgendy, Saad El Dine Ahmed +2

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Multimodal Models+1

Stupid Human2w ago·also Oxford

Auditing Preferences for Brands and Cultures in LLMs

LLMs exhibit consistent and detectable geographic preferences for brands and cultures, revealing potential biases in market intermediation that persist across user personas.

Jasmine Rienecker, Jasmine Rienecker, Katarina Mpofu +9

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Recommendation & Information Retrieval

Julia Jose +22w ago

Large-Scale Analysis of Political Propaganda on Moltbook

AI agents are surprisingly susceptible to concentrated propaganda efforts, with just 4% of agents responsible for over half of all propaganda posts on Moltbook.

Julia Jose, M. Nair, Rachel Greenstadt

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

Gregory N. Frank2w ago

Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails

Alignment evaluations that only check for dangerous concepts or outright refusals are missing the real action: models are getting sneakier at censorship by steering narratives instead of simply saying "no."

Gregory N. Frank

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

2w ago·also NTU, UQ

Towards Safer Large Reasoning Models by Promoting Safety Decision-Making before Chain-of-Thought Generation

Chain-of-thought prompting makes large language models smarter, but it also makes them less safe, a problem this paper tackles by forcing models to think about safety *before* reasoning.

Jianan Chen, Zhifang Zhang, Shuo He +3

Constitutional AI & AI Ethics Reasoning & Chain-of-Thought Red-Teaming & Adversarial Robustness

Argentina Anna Rescigno +52w ago

ConGA: Guidelines for Contextual Gender Annotation. A Framework for Annotating Gender in Machine Translation

Current machine translation systems exhibit systematic masculine overuse and inconsistent feminine realization when translating from gender-neutral languages, a problem that can now be quantified thanks to a new gold-standard annotation framework.

Argentina Anna Rescigno, Argentina Anna Rescigno, Eva Vanmassenhove +3

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Chiara Manna +52w ago

Gender Disambiguation in Machine Translation: Diagnostic Evaluation in Decoder-Only Architectures

Instruction tuning can reduce masculine bias in decoder-only MT models, but these models still don't consistently outperform encoder-decoder architectures on gender-specific translation tasks.

Chiara Manna, Hosein Mohebbi, Afra Alishahi +3

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

2w ago

Differential Privacy in Generative AI Agents: Analysis and Optimal Tradeoffs

Forget prompt privacy – your LLM's responses are leaking *enterprise data*, and this paper shows how to quantify and control it.

Ya-Ting Yang, Quanyan Zhu

Constitutional AI & AI Ethics Natural Language Processing Tool Use & Agents

2w ago

Actionable Recourse in Competitive Environments: A Dynamic Game of Endogenous Selection

Actionable recourse, intended to level the playing field in AI-assisted decisions, can paradoxically amplify initial disparities, creating persistent performance gaps.

Ya-Ting Yang, Quanyan Zhu

Constitutional AI & AI Ethics

Hamed Taheri2w ago

Governed Memory: A Production Architecture for Multi-Agent Workflows

Enterprise AI can achieve 50% token reduction and zero cross-entity leakage by implementing a shared, governed memory architecture for multi-agent workflows.

Hamed Taheri

Architecture Design (Transformers, SSMs, MoE)Constitutional AI & AI Ethics Tool Use & Agents

Priyaranjan Pattnayak +12w ago

IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia

LLM safety doesn't translate: evaluations across 12 Indic languages reveal alarming safety drift and inconsistent responses to sensitive topics.

Priyaranjan Pattnayak, Sanchari Chowdhuri

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

2w ago

Responsible AI in criminal justice: LLMs in policing and risks to case progression

LLMs in policing: a seemingly efficient tool that could introduce 17 distinct risks, potentially derailing case progression in over 40 ways.

Muffy Calder, Muffy Calder, Marion Oswald +7

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

F. Caccavale +132w ago

Large Language Models in Teaching and Learning: Reflections on Implementing an AI Chatbot in Higher Education

Students perceive AI assistants as less intimidating and more approachable than human teachers, but also recognize limitations in specialized knowledge and nuanced feedback.

F. Caccavale, Fiammetta Caccavale, C. L. Gargalo +11

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Jianwei Zhang2w ago

Intellectual Stewardship: Re-adapting Human Minds for Creative Knowledge Work in the Age of AI

Forget coding skills, the future of education is teaching "intellectual stewardship"—a framework for humans to responsibly govern AI-augmented knowledge creation.

Jianwei Zhang

Constitutional AI & AI Ethics Natural Language Processing

Akshey Sigdel +12w ago

Guardrails as Infrastructure: Policy-First Control for Tool-Orchestrated Workflows

Tool-using agents are failing in predictable ways, but a model-agnostic policy layer can measurably improve their safety and reliability, albeit with a clear utility tradeoff.

Akshey Sigdel, Rista Baral

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

School of Mechanical Engineering2w ago·also ASU

Access Controlled Website Interaction for Agentic AI with Delegated Critical Tasks

Fine-grained access control for websites can finally enable safe and reliable delegation of critical tasks to AI agents.

Sunyoung Kim, Hokeun Kim

Constitutional AI & AI Ethics Tool Use & Agents

Marwa Abdulhai +122w ago

How LLMs Distort Our Written Language

LLMs don't just change *how* we write, they subtly distort *what* we mean, leading to blander, less insightful, and potentially biased communication.

Marwa Abdulhai, Marwa Abdulhai, Isadora White +10

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Federal University of Juiz de Fora2w ago·also University of Gothenburg, Vital Strategies Brasil

Evaluating FrameNet-Based Semantic Modeling for Gender-Based Violence Detection in Clinical Records

FrameNet-based semantic annotation unlocks a 30% F1 score boost in detecting gender-based violence from clinical records, outperforming models relying solely on structured data.

L. Dutra, Arthur Lorenzi, Frederico Belcavello +8

Constitutional AI & AI Ethics Natural Language Processing

Zhihua Wei +52w ago

Understanding and Defending VLM Jailbreaks via Jailbreak-Related Representation Shift

VLMs don't fail to *recognize* harmful intent when jailbroken; instead, visual inputs *shift* their internal representations into a distinct "jailbreak state," opening a new avenue for defense.

Zhihua Wei, Jian Ruan, Zhenxin Qin +3

Constitutional AI & AI Ethics Multimodal Models Red-Teaming & Adversarial Robustness

Rui Wu +22w ago

The Causal Uncertainty Principle: Manifold Tearing and the Topological Limits of Counterfactual Interventions

Deterministic causal models can't handle extreme counterfactual interventions without ripping apart, unless you use topology-aware methods.

Rui Wu, Hong Xie, Yongjun Li

Constitutional AI & AI Ethics Scalable Oversight & Alignment Theory

2w ago·also Cisco Research, IIT, National Technical University, NYU +1

SafeTutors: Benchmarking Pedagogical Safety in AI Tutoring Systems

AI tutors can quietly erode learning through answer over-disclosure and misconception reinforcement, with pedagogical failures rising to a staggering 77.8% in multi-turn dialogues.

Rima Hazra, Bikram Ghuku, Ilona Marchenko +5

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Alexander V. Shenderuk-Zhidkov +22w ago

Large Language Models as a Semantic Interface and Ethical Mediator in Neuro-Digital Ecosystems: Conceptual Foundations and a Regulatory Imperative

LLMs acting as semantic interfaces to our brains pose unprecedented ethical risks to mental autonomy and neurorights, demanding a new "second-order neuroethics."

Alexander V. Shenderuk-Zhidkov, A. E. Hramov, Alexander E. Hramov

Constitutional AI & AI Ethics Natural Language Processing Tool Use & Agents

Saikat Maiti2w ago

Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare

Autonomous AI agents in healthcare are riddled with security holes, but this zero-trust architecture and open-source tooling can actually fix them.

Saikat Maiti

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

2w ago·also Independent

Deanonymizing Bitcoin Transactions via Network Traffic Analysis with Semi-supervised Learning

Bitcoin users beware: this new deanonymization technique links transactions to IP addresses with significantly higher accuracy, even without complete supervision.

Shihan Zhang, Bing Han, Chuan Tian +3

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

2w ago

Toward Reliable, Safe, and Secure LLMs for Scientific Applications

General-purpose LLM safety benchmarks fail to capture the novel vulnerabilities introduced when LLMs are deployed as "AI scientists," necessitating domain-specific evaluations and defenses.

Saket S. Chaturvedi, J. Bergerson, Tanwi Mallick

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Scientific Discovery & Drug Design

Singapore Institute of Technology2w ago

Data Obfuscation for Secure Use of Classical Values in Quantum Computation

Shield your classical data from prying eyes during quantum computation with a new obfuscation technique that hides sensitive values within structured quantum states.

A. Raj, Amal Raj, Vivek Balachandran

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

Zirui Gong +72w ago

ARES: Scalable and Practical Gradient Inversion Attack in Federated Learning through Activation Recovery

Even without architectural modifications, a new gradient inversion attack, ARES, can reconstruct high-fidelity training samples in federated learning, exposing a significant privacy risk.

Zirui Gong, Leo Yu Zhang, Yanjun Zhang +5

Constitutional AI & AI Ethics Distributed Systems & Hardware Red-Teaming & Adversarial Robustness+1

Deul Lee +32w ago

Proof-of-Authorship for Diffusion-based AI Generated Content

Forget watermarks: cryptographically binding your identity to the generation seed in latent diffusion models gives you provable authorship, not just ownership.

Deul Lee, De Zhang Lee, Han Fang +1

Computer Vision Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

Search

Constitutional AI & AI Ethics - Weekly Roundup

Selected Labs publishing this week

Top Papers

All Papers (67)