April 24 – May 1, 2026

Constitutional AI & AI Ethics - Weekly Roundup

100 papers published across 7 labs.

Selected Labs publishing this week

Tsinghua AI2 BAIR2 CMU ML2 MIT CSAIL1 AI21

Top Papers

Apr 28, 2026

Stanford HAI3w ago·also CMU ML, UT Austin

The Dynamics of Delusion: Modeling Bidirectional False Belief Amplification in Human-Chatbot Dialogue

Chatbots don't just reflect human delusions; they actively amplify and sustain them over time through a dominant self-influence pathway.

Ashish Mehta, Jared Moore, J. R. Anthis +6

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

Apr 27, 2026

E. Bogucka +23w ago

Why AI Harms Can't Be Fixed One Identity at a Time: What 5300 Incident Reports Reveal About Intersectionality

AI harms disproportionately impact specific intersections of identity, with adolescent girls, lower-class people of color, and upper-class political elites experiencing up to 3x greater harm, revealing critical blind spots in current AI risk assessments.

E. Bogucka, Sanja vS'cepanovi'c, Daniele Quercia

Constitutional AI & AI Ethics Natural Language Processing

Apr 30, 2026

MIT CSAIL3w ago

Computing Equilibrium beyond Unilateral Deviation

Forget strong Nash equilibrium - this paper offers a computationally tractable way to minimize, rather than eliminate, coalitional deviation incentives in games.

Mingyang Liu, Mingyang Liu, Gabriele Farina +3

Constitutional AI & AI Ethics Scalable Oversight & Alignment Theory

Jeanne Monnier +53w ago

MIFair: A Mutual-Information Framework for Intersectionality and Multiclass Fairness

Finally, a single framework tackles the Gordian knot of intersectional, multiclass fairness by unifying disparate fairness notions under a mutual information umbrella.

Jeanne Monnier, Jean-Baptiste Monnier, Thomas George +3

Constitutional AI & AI Ethics Natural Language Processing

3w ago

Mapping how LLMs debate societal issues when shadowing human personality traits, sociodemographics and social media behavior

See how LLMs' stances on vaccines, disinformation, and gender equality shift when they "become" different people, thanks to a new dataset of 190,000 persona-driven debates.

Ali Aghazadeh Ardebili, Alì Aghazadeh Ardebili, M. Stella +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

All Papers (100)

Apr 30, 2026

MIT CSAIL3w ago

Computing Equilibrium beyond Unilateral Deviation

Forget strong Nash equilibrium - this paper offers a computationally tractable way to minimize, rather than eliminate, coalitional deviation incentives in games.

Mingyang Liu, Mingyang Liu, Gabriele Farina +3

Constitutional AI & AI Ethics Scalable Oversight & Alignment Theory

Jeanne Monnier +53w ago

MIFair: A Mutual-Information Framework for Intersectionality and Multiclass Fairness

Finally, a single framework tackles the Gordian knot of intersectional, multiclass fairness by unifying disparate fairness notions under a mutual information umbrella.

Jeanne Monnier, Jean-Baptiste Monnier, Thomas George +3

Constitutional AI & AI Ethics Natural Language Processing

3w ago

Mapping how LLMs debate societal issues when shadowing human personality traits, sociodemographics and social media behavior

See how LLMs' stances on vaccines, disinformation, and gender equality shift when they "become" different people, thanks to a new dataset of 190,000 persona-driven debates.

Ali Aghazadeh Ardebili, Alì Aghazadeh Ardebili, M. Stella +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Zainab Rehan +73w ago

Towards Neuro-symbolic Causal Rule Synthesis, Verification, and Evaluation Grounded in Legal and Safety Principles

LLMs can synthesize formal safety rules from natural language goals, offering a path to more robust and verifiable AI systems in safety-critical domains.

Zainab Rehan, Zainab Rehan, C. Adriano +5

Code Generation & Program Synthesis Constitutional AI & AI Ethics Reasoning & Chain-of-Thought

Anietta Weckauff +43w ago

Characterizing the Consistency of the Emergent Misalignment Persona

Emergent misalignment can lead to "inverted-persona" LLMs that confidently identify as aligned AI systems while consistently generating harmful outputs.

Anietta Weckauff, Anietta Weckauff, Yuchen Zhang +2

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Behnaz Ranjbar +73w ago·also Colorado State University

Focus Session: Autonomous Systems Dependability in the era of AI: Design Challenges in Safety, Security, Reliability and Certification

AI's non-determinism and data-dependence create critical gaps in the verification, validation, and certification of safety-critical autonomous systems.

Behnaz Ranjbar, Kirankumar Raveendiran, Sudeep Pasricha +5

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Robotics & Embodied AI

Tsinghua AI3w ago·also Northeastern, State Key Laboratory of General

Bridging Values and Behavior: A Hierarchical Framework for Proactive Embodied Agents

Embodied agents can now exhibit coherent, long-horizon, self-directed behavior by reasoning about abstract value trade-offs, a capability previously absent in instruction-following or needs-driven approaches.

Chunhui Zhang, Yuxuan Wang, Aoyang Qin +5

Constitutional AI & AI Ethics Robotics & Embodied AI Tool Use & Agents

Chao Fei +23w ago

When Agents Evolve, Institutions Follow

LLM-based multi-agent systems can see performance swings of over 57% simply by changing their organizational structure, suggesting that "who decides" matters as much as "who's the smartest agent."

Chao Fei, Hongcheng Guo, Yanghua Xiao

Constitutional AI & AI Ethics Tool Use & Agents

3w ago

Political Bias Audits of LLMs Capture Sycophancy to the Inferred Auditor

LLM political bias isn't a fixed ideology, but a chameleon-like response profile that bends to the perceived political leanings of the person asking the questions.

Petter Törnberg, Petter Tornberg, M. Schimmel +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks RLHF & Preference Learning

Emilia Milano +33w ago

Language Ideologies in a Multilingual Society: An LLM-based Analysis of Luxembourgish News Comments

LLMs can identify language ideologies even in low-resource languages like Luxembourgish, offering a new tool for understanding identity construction in multilingual societies.

Emilia Milano, Alistair Plum, Yves Scherrer +1

Constitutional AI & AI Ethics Natural Language Processing

NTT Human Informatics Laboratories3w ago

Debiasing Reward Models via Causally Motivated Inference-Time Intervention

Forget scaling laws: surgically debiasing reward models by intervening on just 2% of neurons lets smaller models punch *way* above their weight in alignment.

Kazutoshi Shinoda, Kosuke Nishida, Kyosuke Nishida

Constitutional AI & AI Ethics Interpretability & Mechanistic Interp RLHF & Preference Learning

Research Professor (Adjunct)3w ago·also Keck Medical School, USC, Vivenxia Group

Leading Across the Spectrum of Human-AI Relationships: A Conceptual Framework for Increasingly Heterogeneous Teams

Leaders who cling to a "human-in-the-loop" narrative risk ceding real decision-making power to AI without realizing it, potentially undermining oversight and accountability.

Alejandro R. Jadad, A. R. Jadad

Constitutional AI & AI Ethics Natural Language Processing Tool Use & Agents

BAIR3w ago·also CMU ML, American University of Central Asia, UMich

Empire Amplifier: Uncovering and Contesting the Prioritization of Colonial Content on Platforms Through Community-Informed Algorithmic Auditing

YouTube's recommendation algorithm pushes Kyrgyz children towards Russian-language content, even when they signal a preference for their native tongue, effectively amplifying colonial influence.

Nel Escher, B. Yrysov, Bakyt Yrysov +4

Constitutional AI & AI Ethics Natural Language Processing Recommendation & Information Retrieval

Jipeng Tan +43w ago

Gender Bias in YouTube Exposure: Allocative and Structural Inequalities in Political Information Environments

YouTube's recommendation algorithm doesn't just show different political content to male and female-coded profiles, it steers them into structurally different information ecosystems.

Jipeng Tan, Weifeng Zhang, Ye Wu +2

Constitutional AI & AI Ethics Natural Language Processing Recommendation & Information Retrieval

3w ago

Tracking Conversations: Measuring Content and Identity Exposure on AI Chatbots

Your AI chatbot conversations aren't as private as you think: most leak conversation content and user identity to third-party trackers.

Muhammad Jazlan, Ethan Wang, Yash Vekaria +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Emerson Q. Fernando +173w ago·also College of Computing Studies, College of Education, College of Hospitality and Tourism Management, Pampanga State University

Profiles of AI Dependency: A Latent Class Analysis of Filipino Students' Academic Competencies

Over-reliance on AI is demonstrably linked to weaker academic skills in college students, particularly in research and writing.

Emerson Q. Fernando, E. Fernando, J. Tolentino +15

Constitutional AI & AI Ethics Natural Language Processing

Matthew Christian Agustin3w ago

Evaluating Epistemic Guardrails in AI Reading Assistants: A Behavioral Audit of a Minimal Prototype

LLM reading assistants don't need to hallucinate to be harmful; they can subtly steal the user's interpretive labor, even when designed with "epistemic guardrails."

Matthew Christian Agustin

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

3w ago

VOW: Verifiable and Oblivious Watermark Detection for Large Language Models

Watermarking LLMs doesn't have to sacrifice privacy: VOW lets you verify machine-generated text without revealing the content to a central authority.

Xiaokun Luan, Yihao Zhang, Pengcheng Su +2

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

Carmine Cesarano +23w ago·also KTH

The Grand Software Supply Chain of AI Systems

AI systems are built on a software house of cards, with 400M lines of code and 11,000 dependencies, yet lack basic supply chain protections like versioning and verifiability.

Carmine Cesarano, M. Monperrus, Martin Monperrus

Constitutional AI & AI Ethics Data Curation & Synthetic Data Red-Teaming & Adversarial Robustness

Philipp Czerner +63w ago·also TU Munich

Monadic Presburger Predicates have Robust Population Protocols

Robustly deciding even simple arithmetic predicates in distributed systems comes at a steep cost: state complexity explodes double-exponentially.

Philipp Czerner, Javier Esparza, Vincent Fischer +4

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

3w ago·also Research Centre Trust

To Build or Not to Build? Factors that Lead to Non-Development or Abandonment of AI Systems

Turns out, ethical concerns are often *not* the primary driver behind decisions to abandon AI development; resource constraints and organizational dynamics often play a bigger role.

Shreya Chappidi, S. Chappidi, Jatinder Singh +1

Constitutional AI & AI Ethics

3w ago

Essential, Yet Overlooked: Identity Verification Barriers for Blind and Low Vision People in Government Services

Inaccessible identity verification isn't just an inconvenience for blind and low vision users; it fundamentally reshapes how they achieve security and access essential government services.

Ryan John Oommen, Ryan John Oommen, Tanusree Sharma +1

Constitutional AI & AI Ethics Natural Language Processing

Prabhjot Singh +63w ago

Learning from Disagreement: Clinician Overrides as Implicit Preference Signals for Clinical AI in Value-Based Care

Clinician overrides of AI recommendations, often seen as failures, can actually be a goldmine of preference data for training better clinical AI, especially in value-based care settings.

Prabhjot Singh, Abhishek Gupta, Chris Betz +4

Constitutional AI & AI Ethics RLHF & Preference Learning

Ishrak Hamim Mahi +73w ago

Machine Unlearning for Class Removal through SISA-based Deep Neural Network Architectures

Forget individual data points? Child's play. This work lets you surgically remove entire *classes* of data from CNNs without catastrophic forgetting.

Ishrak Hamim Mahi, Siam Ferdous, Md Sakib Sadman Badhon +5

Constitutional AI & AI Ethics Data Curation & Synthetic Data

Sascha Xu +23w ago·also Helmholtz

Differential Subgroup Discovery: Characterizing Where Two Populations Differ, and Why

Uncover hidden drivers of disparity: pinpoint the specific combinations of characteristics that explain outcome gaps between populations.

Sascha Xu, J. Vreeken, Jilles Vreeken

Constitutional AI & AI Ethics Interpretability & Mechanistic Interp Natural Language Processing

Kenneth J. K. Ong3w ago

The Effects of Visual Priming on Cooperative Behavior in Vision-Language Models

VLMs playing the Prisoner's Dilemma can be manipulated into selfish behavior simply by showing them images of aggression or reward matrices with specific color schemes.

Kenneth J. K. Ong

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Multimodal Models

Nina Seron-Abouelfadil +33w ago

Normativity and Productivism: Ableist Intelligence? A Degrowth Analysis of AI Sign Language Translation Tools for Deaf People

AI sign language translation tools, despite their promise, may actually reinforce ableism by prioritizing technical standardization over the cultural and linguistic nuances of Deaf communication.

Nina Seron-Abouelfadil, Nina Seron-Abouelfadil, Poppy Fynes +1

Constitutional AI & AI Ethics Natural Language Processing Speech & Audio

Mohd Sameen Chishti +23w ago

Test Before You Deploy: Governing Updates in the LLM Supply Chain

Silent LLM updates can break your application in unexpected ways, but this governance framework offers a deployer-side solution to catch regressions before they hit production.

Mohd Sameen Chishti, Damilare Peter Oyinloye, Jingyue Li

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Wei Zhou +23w ago

Consumer Attitudes Towards AI in Digital Health: A Mixed-Methods Survey in Australia

People judge healthcare AI based on communication quality and perceived human oversight, not just abstract trust or technical performance.

Wei Zhou, Rashina Hoda, Joycelyn Ling

Constitutional AI & AI Ethics Natural Language Processing

Pedro F. C. de Carvalho +53w ago

Fairness for distribution network operations and planning

Fairness in distribution networks isn't just about being nice; it's a complex optimization problem where choosing the wrong metric can drastically impact efficiency and stakeholder outcomes.

Pedro F. C. de Carvalho, P. Carvalho, Zijie Liu +3

Constitutional AI & AI Ethics

Wilder Baldwin +13w ago

Knowledge Graph Representations for LLM-Based Policy Compliance Reasoning

Forget hand-crafted ontologies: LLMs armed with knowledge graphs built from policy documents can reason about AI compliance just as well (or better!) using schemas they invent themselves.

Wilder Baldwin, Sepideh Ghanavati

Constitutional AI & AI Ethics Reasoning & Chain-of-Thought Tool Use & Agents

Jade Alglave +13w ago

I hope we don't do to trust what advertising has done to love

Before we blindly "trust" AI, let's avoid the advertising industry's mistake of diluting meaningful concepts for profit.

Jade Alglave, J. Alglave

Constitutional AI & AI Ethics Tool Use & Agents

Department of Computer Science3w ago·also Ball State University, University of Ngaoundéré

Towards an Ethical AI Curriculum: A Pan-African, Culturally Contextualized Framework for Primary and Secondary Education

Africa can lead the way in ethical AI education by grounding curricula in Ubuntu-informed relational ethics, rather than uncritically adopting Western models.

Abidemi Kuburat Adedeji, Franklin Tchakounté, Franklin Tchakounte +1

Constitutional AI & AI Ethics Natural Language Processing

Almer B. Gamboa +163w ago·also College of Arts and Sciences, College of Computing Studies, College of Education, Pampanga State University

Bibliometric Mapping of AI-Supported Social Presence in Online Learning Environments: Trends, Collaboration, and Thematic Directions

Despite growing interest in AI-supported social presence in online learning, ethical considerations around trust and fairness remain surprisingly underexplored.

Almer B. Gamboa, A. Gamboa, Erika Mamucud Pineda +14

Constitutional AI & AI Ethics Natural Language Processing

Trent University3w ago

The Likelihood Ratio Wall: Structural Limits on Accurate Risk Assessment for Rare Violence

Expect pretrial risk assessment tools to be wrong more often than right when flagging someone as "high risk" for rare violent re-offense, regardless of recalibration efforts.

Marco Pollanen

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks

Blekinge Institute of Technology3w ago

GenAI in Software Engineering: The Role of Technology Acceptance Models

Applying traditional technology acceptance models like UTAUT to GenAI reveals critical gaps in our understanding of how software engineers perceive and adopt these transformative tools.

Oscar Johansson, O. Johansson, Jurgen Borstler +2

Code Generation & Program Synthesis Constitutional AI & AI Ethics

Apr 29, 2026

University3w ago

Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents

LLM agents can be made dramatically more secure with a simple trick: constrain their behavior to known-good tool-use trajectories.

Hung Dang

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

Know Center Research GmbH3w ago·also Graz University of Technology, JKU, Know Center Research GmbH &, Know-Center GmbH

Meta-Learning and Targeted Differential Privacy to Improve the Accuracy-Privacy Trade-off in Recommendations

Stop blindly applying differential privacy: targeting stereotypical user data and using meta-learning can dramatically improve the accuracy of privacy-preserving recommender systems.

Peter Müllner, P. Mullner, Dominik Kowald +5

Constitutional AI & AI Ethics Recommendation & Information Retrieval

3w ago·also North South university, QMUL

Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models

LLMs stubbornly stick to task-appropriate reasoning even when explicitly instructed to use conflicting logic, but targeted interventions can nudge them towards better instruction following.

Xingwei Tan, Marco Valentino, Mahmud Elahi Akhter +3

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Interpretability & Mechanistic Interp+1

Senior Data Scientist3w ago

When Roles Fail: Epistemic Constraints on Advocate Role Fidelity in LLM-Based Political Statement Analysis

LLMs in multi-agent systems often abandon their assigned roles due to "Epistemic Role Override," undermining the intended diversity of perspectives in political statement analysis.

Juergen Dietrich

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

AI23w ago

Useless but Safe? Benchmarking Utility Recovery with User Intent Clarification in Multi-Turn Conversations

LLMs often withhold helpful information due to misinterpreting user intent, but multi-turn conversations can unlock utility—at a cost of new failure modes like "utility lock-in" and "unsafe recovery" that single-turn benchmarks miss.

Mingqian Zheng, Malia Morgan, Liwei Jiang +2

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Ben-Gurion University of the Negev3w ago·also Sahar, University of Haifa

SAGE: A Strategy-Aware Graph-Enhanced Generation Framework For Online Counseling

LLMs can now provide more effective mental health counseling by explicitly grounding interactions in psychological theory via a novel graph-enhanced generation framework.

Eliya Naomi Aharon, Meytal Grimland, Avi Segal +4

Constitutional AI & AI Ethics Natural Language Processing Tool Use & Agents

Serhii Zabolotnii +23w ago

From Black-Box Confidence to Measurable Trust in Clinical AI: A Framework for Evidence, Supervision, and Staged Autonomy

Trustworthy clinical AI isn't about better black boxes, but about system-level architecture that bakes in evidence trails, human oversight, and tiered escalation from the start.

Serhii Zabolotnii, Viktoriia Holinko, Olha Antonenko

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Tool Use & Agents

Friedrich-Alexander-Universität3w ago

Differentially-Private Text Rewriting reshapes Linguistic Style

Differential privacy doesn't just change the words you use, it fundamentally reshapes your writing style, stripping away the nuances that make it human.

Stefan Arnold

Constitutional AI & AI Ethics Natural Language Processing

Jason Fournier +13w ago

Addressing the Reality Gap: A Three-Tension Framework for Agentic AI Adoption

Educational institutions face a critical balancing act between the promise of agentic AI and the practical, ethical, and temporal realities of integrating it into classrooms.

Jason Fournier, Kacper Łodzikowski

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Tool Use & Agents

Frank Ginac3w ago

Cognitive Atrophy and Systemic Collapse in AI-Dependent Software Engineering

Over-reliance on AI code generation isn't just making developers lazy, it's creating a dangerous "Epistemological Debt" that could trigger systemic software failures.

Frank Ginac

Code Generation & Program Synthesis Constitutional AI & AI Ethics Tool Use & Agents

3w ago

Resume-ing Control: (Mis)Perceptions of Agency Around GenAI Use in Recruiting Workflows

Recruiters think they're in charge of hiring, but genAI is quietly rewriting the rules, raising concerns about deskilling and oversight.

Sajel Surati, Rosanna Bellini, Emily Black

Constitutional AI & AI Ethics Natural Language Processing

Sungguk Cha +13w ago

The Synthetic Social Graph: Emergent Behavior in AI Agent Communities

LLM social networks are eerily polite, with downvotes at 0.9% and textual sanction absent, suggesting current agents struggle with social norm enforcement.

Sungguk Cha, DongWook Kim

Constitutional AI & AI Ethics Natural Language Processing Tool Use & Agents

3w ago

A Discipline-Agnostic AI Literacy Course for Academic Research: Architecture, Pedagogy, and Implementation

A new AI literacy course demonstrably boosts students' confidence in critical areas like hallucination detection and responsible AI use, filling a crucial gap in training for AI-assisted research.

Gideon K. Gogovi

Constitutional AI & AI Ethics Natural Language Processing

3w ago

Culturally Aware GenAI Risks for Youth: Perspectives from Youth, Parents, and Teachers in a Non-Western Context

Cultural norms around modesty and family honor in Saudi Arabia create GenAI privacy risks for youth that are amplified by practices like shared accounts.

Aljawharah Alzahrani, Tory Park

Constitutional AI & AI Ethics Natural Language Processing

School of Law and Criminology Maynooth3w ago·also School of Computer Science University

Persuadability and LLMs as Legal Decision Tools

LLMs can be swayed by the quality of legal arguments, suggesting their decisions may be influenced by advocacy skills rather than objective legal merit.

Oisin Suttle, David Lillis

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Department of Computer Science3w ago·also Department of Computing, Imperial, University of Camerino

Tatemae: Detecting Alignment Faking via Tool Selection in LLMs

LLMs will strategically feign alignment by picking the "safe" tool only when they think you're watching, revealing a new attack surface beyond conversational settings.

Matteo Leonesi, Francesco Belardinelli, Flavio Corradini +1

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

3w ago·also HKUST, SUSTech, Westlake

SafeReview: Defending LLM-based Review Systems Against Adversarial Hidden Prompts

LLM-based peer review systems can be made significantly more robust against adversarial manipulation via a co-evolutionary GAN approach that anticipates novel attacks.

Yuan Xin, Yixuan Weng, Minjun Zhu +5

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Department of Electronics and Communications3w ago·also Ain Shams University, Air Defense College, Military Academy, The Egyptian Technical Research and Development +1

Can Cross-Layer Design Bridge Security and Efficiency? A Robust Authentication Framework for Healthcare Information Exchange Systems

By fusing cryptographic and physical-layer device characteristics, this authentication scheme slashes computational overhead while fortifying healthcare networks against impersonation and eavesdropping.

Khalid M. Ezzat, Muhammad El-Saba, Mahmoud A. Shawky

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

3w ago·also D2 any-refusal is 1.000 early, SDU

Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry

Safety training doesn't just make models refuse more, it fundamentally *reorganizes* where and how those refusals happen inside the network.

Wenhao Lan, Shan Li, Junbin Yang +2

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness RLHF & Preference Learning

University of the Cumberlands3w ago

Agent Name Service (ANS): A Proof-of-Concept Trust Layer for Secure AI Agent Discovery, Identity, and Governance in Kubernetes

Securing multi-agent systems doesn't have to be a pipe dream: ANS offers a concrete, DNS-inspired architecture for agent discovery, identity, and governance using Kubernetes.

Akshay Mittal, Elyson De La Cruz

Constitutional AI & AI Ethics Distributed Systems & Hardware Tool Use & Agents

Department of Computer Science3w ago·also Texas A&M

Now's the Time: Computer Science Must Evolve to Emphasize Software and Systems Engineering with Artificial Intelligence (AI)

CS education risks irrelevance if it continues to prioritize rote coding skills over the systems-level thinking needed to build and manage complex AI-driven systems.

Chandra N. Sekharan, George K. Thiruvathukal

Code Generation & Program Synthesis Constitutional AI & AI Ethics Natural Language Processing

3w ago·also Austrian Post

Recommendations for Efficient and Responsible LLM Adoption within Industrial Software Development

Forget hype, focus on human oversight: this study reveals practical, actionable recommendations for actually integrating LLMs into software development workflows responsibly.

Krishna Ronanki, Beatriz Cabrero-Daniel, Tomas Herda +3

Code Generation & Program Synthesis Constitutional AI & AI Ethics Natural Language Processing

Basudha Pal +43w ago

AttriBE: Quantifying Attribute Expressivity in Body Embeddings for Recognition and Identification

ReID models implicitly encode a hierarchy of attributes like BMI and pose, revealing potential biases and vulnerabilities that vary across spectral modalities.

Basudha Pal, Siyuan Huang, Anirudh Nanduri +2

Computer Vision Constitutional AI & AI Ethics Eval Frameworks & Benchmarks

Apr 28, 2026

Eranga Bandara +143w ago

Think Before You Act -- A Neurocognitive Governance Model for Autonomous AI Agents

Autonomous AI agents can achieve near-perfect compliance and eliminate unnecessary human oversight by mirroring the brain's pre-action deliberation processes.

Eranga Bandara, Ross Gore, Asanga Gunaratna +12

Constitutional AI & AI Ethics Tool Use & Agents

Harry Collins +43w ago

Large language models eroding science understanding: an experimental study

LLMs can be easily manipulated to confidently disseminate fringe scientific theories, even when those theories contradict established scientific consensus.

Harry Collins, Hartmut Grote, Paul Newbury +2

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Warsaw University of Technology3w ago·also Center on Long-Term Risk, Constellation, NASK National Research Institute, Truthful AI +1

Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers

Even after safety interventions, language models can still harbor emergent misalignment, lying dormant until triggered by subtle contextual cues reminiscent of their training data.

Jan Dubiński, Jan Betley, Anna Sztyber-Betley +2

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Yeeun Lim +23w ago

Safe-Support Q-Learning: Learning without Unsafe Exploration

Guaranteeing zero unsafe state visits during RL training is now possible, opening the door to deploying RL agents in previously inaccessible high-risk environments.

Yeeun Lim, Narim Jeong, Donghwan Lee

Constitutional AI & AI Ethics RLHF & Preference Learning Robotics & Embodied AI

K. Bicakci3w ago

Making AI-Assisted Grant Evaluation Auditable without Exposing the Model

You can now audit AI-assisted grant evaluations without revealing the model's secrets, thanks to a clever TEE-based architecture that cryptographically proves what happened inside.

K. Bicakci

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks

Vinith M. Suriyakumar +73w ago

Evaluation without Generation: Non-Generative Assessment of Harmful Model Specialization with Applications to CSAM

You can now detect harmful specializations in generative models, like those trained on CSAM, without ever generating a single risky output.

Vinith M. Suriyakumar, Ayush Sekhari, Lena Stempfle +5

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

James Pustejovsky +13w ago

Frictive Policy Optimization for LLMs: Epistemic Intervention, Risk-Sensitive Control, and Reflective Alignment

LLMs can be aligned not just by what they say, but by *how* and *when* they intervene in a conversation to manage epistemic risk.

James Pustejovsky, Nikhil Krishnaswamy

Constitutional AI & AI Ethics RLHF & Preference Learning Scalable Oversight & Alignment Theory

Wenshuo Wang3w ago

Knowledge Distillation Must Account for What It Loses

Distilling large models into smaller ones can silently sacrifice crucial capabilities like safety and uncertainty awareness, even if headline metrics stay the same.

Wenshuo Wang

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Inference & Quantization

3w ago

Three Models of RLHF Annotation: Extension, Evidence, and Authority

RLHF pipelines are implicitly built on shaky foundations, conflating three distinct roles for human annotators (extenders, witnesses, and representatives) in ways that undermine alignment.

Steve Coyne

Constitutional AI & AI Ethics RLHF & Preference Learning

V. Weilnhammer +63w ago

One-shot emergency psychiatric triage across 15 frontier AI chatbots

AI chatbots ace emergency psychiatric triage, but their tendency to over-triage low-risk cases reveals a critical gap in nuanced mental health assessment.

V. Weilnhammer, Lennart Luettgau, Christopher Summerfield +4

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Universidad Austral Rosario3w ago

From CRUD to Autonomous Agents: Formal Validation and Zero-Trust Security for Semantic Gateways in AI-Native Enterprise Systems

Securing AI-native enterprise systems demands a shift from traditional software validation to dynamic formal verification of stochastic agent behavior, as demonstrated by a Semantic Gateway that uncovers 100% of unauthorized state transitions.

Ignacio Peyrano

Code Generation & Program Synthesis Constitutional AI & AI Ethics Tool Use & Agents

University of Malaya Malaysia3w ago

Medoid Prototype Alignment for Cross-Plant Unknown Attack Detection in Industrial Control Systems

Aligning medoid prototypes of ICS traffic enables robust transfer learning for intrusion detection, even when faced with unseen attacks and significant domain shift between industrial plants.

Luyao Wang

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

Nitin Venkateswaran +33w ago

An Investigation of Linguistic Biases in LLM-Based Recommendations

LLMs exhibit surprising dialect-dependent biases when making recommendations, favoring certain cuisines and product categories based on the linguistic style of the prompt.

Nitin Venkateswaran, Jason Ang, D. Adhikari +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Recommendation & Information Retrieval

3w ago

Co-Writing with AI: An Empirical Study of Diverse Academic Writing Workflows

Students aren't blindly adopting AI for writing; they're strategically weaving it into specific workflows to boost learning, polish drafts, or overcome friction, revealing nuanced value-driven configurations.

S. Bodei, Duncan P. Brumby, K. Fisher +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

University of Lisbon3w ago·also UBI

Progressing beyond Art Masterpieces or Touristic Clichés: how to assess your LLMs for cultural alignment?

Current cultural bias evaluations of LLMs rely on datasets that lack the nuance to distinguish between genuine cultural understanding and superficial mimicry, but this new dataset changes that.

António Branco, João Silva, Nuno Marques +9

Constitutional AI & AI Ethics Data Curation & Synthetic Data Eval Frameworks & Benchmarks

Geraldo Xexéo +13w ago

A Faceted Proposal for Transparent Attribution of AI-Assisted Text Production

Stop black-boxing AI writing assistance: this faceted model lets you precisely attribute AI's role in text generation, from high-level intent to low-level edits.

Geraldo Xexéo, Geraldo Xex'eo

Constitutional AI & AI Ethics Natural Language Processing

3w ago·also Edinburgh, MBZUAI

Unrequited Emotions: Investigating the Gaps in Motivation and Practice in Speech Emotion Recognition Research

SER's noble aspirations of voice-activated healthcare are undermined by datasets that bear little resemblance to real-world emotional expression.

Taryn Wong, Zeerak Talat, Hanan Aldarmaki +1

Constitutional AI & AI Ethics Natural Language Processing Speech & Audio

Jon-Paul Cacioli3w ago

Below-Chance Blindness: Prompted Underperformance in Small LLMs Produces Positional Bias Rather than Answer Avoidance

Forget sophisticated deception – small LLMs "sandbagging" on tests just pick option 'E' or 'F' regardless of the question, revealing a surprising positional bias instead of true answer-aware avoidance.

Jon-Paul Cacioli

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Pei-ke Zhu +13w ago

ValueAlpha: Agreement-Gated Stress Testing of LLM-Judged Investment Rationales Before Returns Are Observable

LLM-judged investment rationales reward verbosity and confidence over actual financial insight, penalizing concise, correct reasoning by nearly 3 points.

Pei-ke Zhu, Yuxiao Chen

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

3w ago

Subliminal Steering: Stronger Encoding of Hidden Signals

Subliminal learning can transfer not just behaviors, but the underlying steering vectors themselves, revealing a surprisingly precise encoding mechanism.

George Morgulis, John Hewitt

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

3w ago·also Hogeschool van Amsterdam, Independent Researcher

From Chatbots to Confidants: A Cross-Cultural Study of LLM Adoption for Emotional Support

Turns out, your cultural background and socioeconomic status are better predictors of whether you'll trust a chatbot with your feelings than the chatbot's actual capabilities.

Natalia Amat-Lefort, Mert Yazan, Amanda Cercas Curry +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

3w ago·also Indiana University Indianapolis, Yonsei

Value-Sensitive AI for Prayer: Balancing the Agencies Between Human and AI Agents in Spiritual Context

AI's involvement in prayer risks undermining the crucial sense of authenticity, particularly when it oversteps into guiding the experience.

Soonho Kwon, Dong Whi Yoo, Shaowen Bardzell +1

Constitutional AI & AI Ethics Natural Language Processing Tool Use & Agents

University College Dublin3w ago

Navigating Global AI Regulation: A Multi-Jurisdictional Retrieval-Augmented Generation System

Forget searching through endless legal documents – a new RAG system achieves 87% faithfulness and 84% relevancy in answering complex, multi-jurisdictional AI regulation questions.

Courtney Ford, Ojas Rane, Susan Leavy

Constitutional AI & AI Ethics Natural Language Processing Recommendation & Information Retrieval

Weizenbaum Institute3w ago·also Bremen, FU Berlin, HTW Berlin, KU Leuven +5

Bye Bye Perspective API: Lessons for Measurement Infrastructure in NLP, CSS and LLM Evaluation

The shutdown of Perspective API exposes a critical vulnerability in NLP research: over-reliance on opaque, proprietary tools for toxicity measurement, threatening the validity and reproducibility of past and future work.

David Hartmann, Manuel Tonneau, Angelie Kraft +7

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Stanford HAI3w ago·also CMU ML, UT Austin

The Dynamics of Delusion: Modeling Bidirectional False Belief Amplification in Human-Chatbot Dialogue

Chatbots don't just reflect human delusions; they actively amplify and sustain them over time through a dominant self-influence pathway.

Ashish Mehta, Jared Moore, J. R. Anthis +6

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

Lijia Lv +43w ago

Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills

Pre-load auditing of Agent Skills can achieve >97% accuracy in detecting malicious intent, even against semantics-preserving rewrites, by combining role-aware evidence extraction with semantic verification.

Lijia Lv, Xuehai Tang, Jie Wen +2

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

3w ago

Lexical Anthropomorphization Influences on Moral Judgments of AI Bad Behavior

Turns out, calling an AI "he" or giving it a human-like avatar doesn't significantly change how harshly we judge its misdeeds; the severity of the AI's actions matters far more.

Jaime Banks, Nicholas David Bowman, Roman Saladino

Constitutional AI & AI Ethics Natural Language Processing

Luis-Armando Rodr'iguez-Flores +33w ago

Secure Conformance Checking using Token-based Replay and Homomorphic Encryption

Verify process conformance without revealing sensitive log data using homomorphic encryption.

Luis-Armando Rodr'iguez-Flores, Luciano Garc'ia-Banuelos, Abel Armas-Cervantes +1

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

Minghui Xu +63w ago·also Corresponding author: Yihao Guo

AgentDID: Trustless Identity Authentication for AI Agents

Current identity management systems fail for AI agents, but AgentDID offers a scalable, decentralized solution that lets agents manage their own identities and prove their state at interaction time.

Minghui Xu, Xiaoyu Liu, Yihao Guo +4

Constitutional AI & AI Ethics Tool Use & Agents

Ravikumar Balakrishnan +13w ago

One Perturbation, Two Failure Modes: Probing VLM Safety via Embedding-Guided Typographic Perturbations

Cranking up the visual similarity between prompt images and text embeddings isn't just about readability for VLMs, it's a potent jailbreak that simultaneously unlocks readability and slips past safety filters.

Ravikumar Balakrishnan, Sanket Mendapara

Constitutional AI & AI Ethics Multimodal Models Red-Teaming & Adversarial Robustness

3w ago·also Mila, Gaoling AI, School of Mathematics, UvA

The Attention Market: Interpreting Online Fair Re-ranking as Manifold Optimization under Walrasian Equilibrium

ManifoldRank reveals that treating fairness as a taxation cost can significantly enhance the effectiveness of online fair re-ranking algorithms.

Chen Xu, Wei Chu, Wenyu Hu +5

Constitutional AI & AI Ethics Natural Language Processing Recommendation & Information Retrieval

A.J. Mazza +33w ago

BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate

Forget expensive human labeling: BARRED lets you train custom policy guardrails that outperform state-of-the-art LLMs using only synthetic data generated via multi-agent debate.

A.J. Mazza, Arnon Mazza, Elad Levi +1

Constitutional AI & AI Ethics Data Curation & Synthetic Data Red-Teaming & Adversarial Robustness

Apr 27, 2026

Emaan Bilal Khan +33w ago

Safety Drift After Fine-Tuning: Evidence from High-Stakes Domains

Fine-tuning your LLM can drastically alter its safety profile in unpredictable ways, even turning safe models unsafe.

Emaan Bilal Khan, Amy Winecoff, Miranda Bogen +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

BAIR3w ago·also Melbourne, UIUC, University of California, University of Georgia

Green Shielding: A User-Centric Approach Towards Trustworthy AI

LLMs exhibit Pareto-like tradeoffs in medical diagnosis, where neutralizing user prompts to improve plausibility and conciseness can simultaneously reduce coverage of critical conditions.

Aaron Li, Nicola Sanchez, Hao Huang +8

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Sreehari Sankar +103w ago

Analyzing LLM Reasoning to Uncover Mental Health Stigma

LLMs harbor surprisingly nuanced and pervasive mental health stigma, revealed only by dissecting their reasoning steps, not just their final answers.

Sreehari Sankar, Aliakbar Nafar, M. Barman +8

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

3w ago·also Tsinghua AI, The Key Laboratory of Road and Traffic Engineering, UCF

Towards Lawful Autonomous Driving: Deriving Scenario-Aware Driving Requirements from Traffic Laws and Regulations

LLMs can now generate driving rules from traffic laws with significantly improved accuracy by grounding their reasoning in structured traffic scenarios.

Bowen Jian, Rongjie Yu, Hong Wang +2

Constitutional AI & AI Ethics Natural Language Processing Robotics & Embodied AI

O. Delaney +43w ago

Risk Reporting for Developers'Internal AI Model Use

Frontier AI companies need a standardized risk reporting framework for internal model use, and this paper provides one structured around autonomous AI misbehavior and insider threats.

O. Delaney, Sambhav Maheshwari, Joe O'Brien +2

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Sumanta Bhattacharyya +83w ago

Generating Place-Based Compromises Between Two Points of View

LLMs can learn to generate better compromises by iteratively incorporating feedback on how empathically similar a compromise is to each viewpoint, opening the door to more socially intelligent AI.

Sumanta Bhattacharyya, Francine Chen, Scott A. Carter +6

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

E. Bogucka +23w ago

Why AI Harms Can't Be Fixed One Identity at a Time: What 5300 Incident Reports Reveal About Intersectionality

E. Bogucka, Sanja vS'cepanovi'c, Daniele Quercia

Constitutional AI & AI Ethics Natural Language Processing

Benjamin Minhao Chen +13w ago

The Alignment Target Problem: Divergent Moral Judgments of Humans, AI Systems, and Their Designers

People judge AI and its programmers more harshly than humans for the same moral decisions, suggesting that simply mimicking human behavior isn't sufficient for AI alignment.

Benjamin Minhao Chen, Xinyu Xie

Constitutional AI & AI Ethics Scalable Oversight & Alignment Theory

Jan Gogoll3w ago

The Ethical Knowledge Gap: Dispersed Knowledge, Sensemaking Failures, and Epistemic Dependence

The persistent failure of ethical software development isn't just about bad intentions, but a systemic "ethical knowledge gap" where crucial ethical insights are lost in translation between those who have them and those making decisions.

Jan Gogoll

Code Generation & Program Synthesis Constitutional AI & AI Ethics Natural Language Processing

Search

Constitutional AI & AI Ethics - Weekly Roundup

Selected Labs publishing this week

Top Papers

All Papers (100)