March 25 – April 1, 2026

Constitutional AI & AI Ethics - Weekly Roundup

68 papers published across 5 labs.

1% acceleration

Selected Labs publishing this week

Google Research2 UW1 Stanford HAI1 Microsoft Research1 AI21

Top Papers

Mar 31, 2026

Shasha Yu +21d ago

Rethinking AI Literacy Education in Higher Education: Bridging Risk Perception and Responsible Adoption

AI students paradoxically show *higher* adoption willingness despite *lower* risk recognition in practical scenarios, revealing a critical gap in current AI literacy education.

Shasha Yu, Fiona Carroll, Barry L. Bentley

Constitutional AI & AI Ethics

Georgii Mikriukov +21d ago

Uncertainty Gating for Cost-Aware Explainable Artificial Intelligence

Don't waste compute on unreliable explanations: epistemic uncertainty can predict when XAI methods will fail, allowing you to gate their use.

Georgii Mikriukov, Grégoire Montavon, Marina M. -C. Höhne

Constitutional AI & AI Ethics Interpretability & Mechanistic Interp

Edoardo Allegrini +31d ago

BotVerse: Real-Time Event-Driven Simulation of Social Agents

Safely study LLM-driven social behavior at scale, without the ethical minefield of deploying agents on live social networks.

Edoardo Allegrini, Edoardo Di Paolo, Angelo Spognardi +1

Constitutional AI & AI Ethics Tool Use & Agents World Models & Planning

Kuniko Paxton +51d ago

Exploring the Impact of Skin Color on Skin Lesion Segmentation

Forget Fitzpatrick scores: lesion-skin contrast is the real culprit behind skin lesion segmentation errors, not overall skin tone.

Kuniko Paxton, Medina Kapo, Amila Akagić +3

Computer Vision Constitutional AI & AI Ethics

Richard Servajean +11d ago

Measuring the metacognition of AI

LLMs can be rigorously evaluated for metacognitive abilities like confidence assessment and risk-aware decision-making using psychophysical frameworks borrowed from human cognition research.

Richard Servajean, Philippe Servajean

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

All Papers (68)

Mar 31, 2026

Shasha Yu +21d ago

Rethinking AI Literacy Education in Higher Education: Bridging Risk Perception and Responsible Adoption

AI students paradoxically show *higher* adoption willingness despite *lower* risk recognition in practical scenarios, revealing a critical gap in current AI literacy education.

Shasha Yu, Fiona Carroll, Barry L. Bentley

Constitutional AI & AI Ethics

Georgii Mikriukov +21d ago

Uncertainty Gating for Cost-Aware Explainable Artificial Intelligence

Don't waste compute on unreliable explanations: epistemic uncertainty can predict when XAI methods will fail, allowing you to gate their use.

Georgii Mikriukov, Grégoire Montavon, Marina M. -C. Höhne

Constitutional AI & AI Ethics Interpretability & Mechanistic Interp

Edoardo Allegrini +31d ago

BotVerse: Real-Time Event-Driven Simulation of Social Agents

Safely study LLM-driven social behavior at scale, without the ethical minefield of deploying agents on live social networks.

Edoardo Allegrini, Edoardo Di Paolo, Angelo Spognardi +1

Constitutional AI & AI Ethics Tool Use & Agents World Models & Planning

Kuniko Paxton +51d ago

Exploring the Impact of Skin Color on Skin Lesion Segmentation

Forget Fitzpatrick scores: lesion-skin contrast is the real culprit behind skin lesion segmentation errors, not overall skin tone.

Kuniko Paxton, Medina Kapo, Amila Akagić +3

Computer Vision Constitutional AI & AI Ethics

Richard Servajean +11d ago

Measuring the metacognition of AI

LLMs can be rigorously evaluated for metacognitive abilities like confidence assessment and risk-aware decision-making using psychophysical frameworks borrowed from human cognition research.

Richard Servajean, Philippe Servajean

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Christopher Koch1d ago

Beyond the Steeper Curve: AI-Mediated Metacognitive Decoupling and the Limits of the Dunning-Kruger Metaphor

LLMs don't just make people confidently wrong; they create a dangerous illusion of competence by decoupling performance from actual understanding.

Christopher Koch

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

University of Pavia1d ago

Security in LLM-as-a-Judge: A Comprehensive SoK

LLM-as-a-Judge, while improving evaluation scalability, introduces critical security vulnerabilities that can compromise the trustworthiness of entire evaluation pipelines.

Aiman Almasoud, A.B. Anju, Marco Arazzi +6

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Akhil Gupta Chigullapally +31d ago

Downsides of Smartness Across Edge-Cloud Continuum in Modern Industry

Smart industrial systems, while promising increased efficiency, introduce unforeseen interoperability side-effects and heightened vulnerability to cyber threats across heterogeneous IIoT systems.

Akhil Gupta Chigullapally, Sharvan Vittala, Razin Farhan Hussian +1

Constitutional AI & AI Ethics Robotics & Embodied AI

1d ago·also UT Austin

Sima AIunty: Caste Audit in LLM-Driven Matchmaking

LLMs used in matchmaking amplify existing caste hierarchies, rating same-caste matches significantly higher and perpetuating social biases in potentially harmful ways.

Atharva Naik, Shounok Kar, Varnika Sharma +2

Constitutional AI & AI Ethics Natural Language Processing

Ella Rabinovich +31d ago

Near-Miss: Latent Policy Failure Detection in Agentic Workflows

Current evaluation methods miss 8-17% of agentic workflow failures because they only check final outcomes, overlooking cases where agents bypass policy checks but still reach the right answer.

Ella Rabinovich, David Boaz, Naama Zwerdling +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Tool Use & Agents

Gabriel Loiseau +41d ago

Distilling Human-Aligned Privacy Sensitivity Assessment from Large Language Models

You can shrink a privacy expert LLM by 4500x and still get human-level privacy judgments.

Gabriel Loiseau, D. Sileo, Damien Riquet +2

Constitutional AI & AI Ethics Inference & Quantization Natural Language Processing

Yahan Li +71d ago

CounselReflect: A Toolkit for Auditing Mental-Health Dialogues

Mental-health support chatbots get a much-needed reality check with CounselReflect, a toolkit that exposes their strengths and weaknesses through transparent, multi-dimensional audits.

Yahan Li, Chaohao Du, Zeyang Li +5

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Marie-Therese Sekwenz +21d ago

"There is literally zero funding": Understanding the Emerging Role of Trusted Flaggers under the EU Digital Services Act

Despite the EU's Digital Services Act aiming to empower Trusted Flaggers in combating harmful online content, TFs are struggling with accreditation hurdles, resource scarcity, and conflicting platform priorities, raising serious questions about the DSA's practical effectiveness.

Marie-Therese Sekwenz, Kyle Beadle, Simon Parkin

Constitutional AI & AI Ethics Natural Language Processing

Chandler C. Payne +71d ago

Same Rules, Mixed Messages: Exploring Community Perceptions of Academic Dishonesty in Computing Education

Instructors and students are often on different planets when it comes to understanding why cheating happens in CS courses.

Chandler C. Payne, Kai A. Hackney, Lucas Guarenti Zangari +5

Code Generation & Program Synthesis Constitutional AI & AI Ethics Natural Language Processing

Jack Hughes +21d ago

Stand-Alone Complex or Vibercrime? Exploring the adoption and innovation of GenAI tools, coding assistants, and agents within cybercrime ecosystems

Forget killer robots: GenAI's impact on cybercrime is currently more "vibe coding" than world-ending, mainly assisting skilled actors in existing scams rather than unleashing a wave of autonomous cyberattacks.

Jack Hughes, Ben Collier, Daniel R. Thomas

Code Generation & Program Synthesis Constitutional AI & AI Ethics Tool Use & Agents

Andrew G. Ross +11d ago

AI-Simulated Expert Panels for Socio-Technical Scenarios and Decision Guidance

Forget resource-intensive workshops – AI can now simulate entire expert panels to generate and stress-test socio-technical scenarios, opening doors to rapid policy exploration.

Andrew G. Ross, Allan Ross

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Tool Use & Agents

Danielle R. Thomas +41d ago

Modernizing Ground Truth: Four Shifts Toward Improving Reliability and Validity in AI in Education

Stop treating inter-rater reliability as a simple green light for "ground truth" in AIED – your data's probably messier than you think, especially with LLMs in the mix.

Danielle R. Thomas, Conrad Borchers, Kirk Vanacore +2

Constitutional AI & AI Ethics Data Curation & Synthetic Data Eval Frameworks & Benchmarks

1d ago·also TU Delft

An Empirical Comparison of Security and Privacy Characteristics of Android Messaging Apps

Despite using similar cryptographic protocols, popular messaging apps like Messenger, Signal and Telegram exhibit stark differences in attack surface, network activity, and permission requests, raising questions about their overall security and privacy postures.

Ioannis Karyotakis, Foivos Timotheos Proestakis, Evangelos Talos +2

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

Nelly Elsayed1d ago

Security and Privacy in Virtual and Robotic Assistive Systems: A Comparative Framework

Assistive robots aren't just vulnerable to data breaches; they can be hacked to physically harm the very people they're supposed to protect.

Nelly Elsayed

Constitutional AI & AI Ethics Robotics & Embodied AI

Aryan Yazdan Parast +41d ago

HSFM: Hard-Set-Guided Feature-Space Meta-Learning for Robust Classification under Spurious Correlations

Retraining just the classifier head of a frozen feature extractor can be dramatically improved by meta-learning feature-space augmentations that target hard examples, leading to state-of-the-art robustness against spurious correlations.

Aryan Yazdan Parast, Khawar Islam, Soyoun Won +2

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

P. Majumdar +31d ago

Unbiased Model Prediction Without Using Protected Attribute Information

Mitigating bias in deep learning models is now possible without needing sensitive protected attribute information, opening doors for fairer AI in privacy-conscious applications.

P. Majumdar, S. Mittal, M. Vatsa +1

Constitutional AI & AI Ethics Natural Language Processing

Soumyodipta Nath +21d ago

SafeDMPs: Integrating Formal Safety with DMPs for Adaptive HRI

Get provably safe and dynamically robust robot motions in human environments without the computational bottleneck of online optimization.

Soumyodipta Nath, Pranav Tiwari, Ravi Prakash

Constitutional AI & AI Ethics Robotics & Embodied AI

1d ago·also Deakin

Towards Explainable Stakeholder-Aware Requirements Prioritisation in Aged-Care Digital Health

Stakeholder-agnostic requirements engineering in aged-care tech can lead to misalignment and missed priorities, as developers, caregivers, and older adults often disagree on what matters most.

Yuqing Xiao, John C. Grundy, Anuradha Madugalla +1

Constitutional AI & AI Ethics Natural Language Processing

University of Calgary1d ago

Sustainable AI Assistance Through Digital Sobriety

Turns out, almost half of AI assistant queries in software development are unnecessary, suggesting we're over-relying on these tools for tasks better suited to simpler solutions.

Madeleine Jennings, Novarun Deb, Ronnie de Souza Santos

Constitutional AI & AI Ethics Inference & Quantization

University of Sannio1d ago

Machine Learning in the Wild: Early Evidence of Non-Compliant ML-Automation in Open-Source Software

Open-source projects are quietly integrating ML models in ways that may violate terms of service and regulations, raising concerns about unchecked ML automation.

Zohaib Arshid, Daniele Bifolco, Fiorella Zampetti +1

Constitutional AI & AI Ethics Natural Language Processing Open-Source Models & Weights

Mar 30, 2026

2d ago

Superintelligence and Law

Superintelligence will not just be regulated by law, but will actively use and shape it, forcing us to rethink legal theory's human-centric foundations.

Noam Kolt

Constitutional AI & AI Ethics Scalable Oversight & Alignment Theory Tool Use & Agents

Khalid Adnan Alsayed2d ago

Why Aggregate Accuracy is Inadequate for Evaluating Fairness in Law Enforcement Facial Recognition Systems

Aggregate accuracy can be dangerously misleading when evaluating facial recognition systems for law enforcement, obscuring significant disparities in error rates across demographic subgroups.

Khalid Adnan Alsayed

Computer Vision Constitutional AI & AI Ethics Eval Frameworks & Benchmarks

Arsenios Scrivens2d ago

Information-Theoretic Limits of Safety Verification for Self-Improving Systems

Even with a million attempts and a generous risk budget, classifier-based safety gates can only extract a tiny fraction of the utility achievable by a perfect verifier, but a Lipschitz ball verifier offers a potential escape route.

Arsenios Scrivens

Constitutional AI & AI Ethics Scalable Oversight & Alignment Theory

Amir-Hossein Karimi2d ago

Position: Explainable AI is Causality in Disguise

XAI's persistent failures aren't due to a lack of ground truth, but a failure to recognize that ground truth *is* the underlying causal model.

Amir-Hossein Karimi

Constitutional AI & AI Ethics Interpretability & Mechanistic Interp

Yihan Gao +72d ago

FairGC: Fairness-aware Graph Condensation

Graph condensation, while shrinking massive datasets for GNN training, can inadvertently amplify biases – until now.

Yihan Gao, Chenxi Huang, Wen Shi +5

Constitutional AI & AI Ethics Data Curation & Synthetic Data

Adam Laabs2d ago

T-Norm Operators for EU AI Act Compliance Classification: An Empirical Comparison of Lukasiewicz, Product, and Gödel Semantics in a Neuro-Symbolic Reasoning System

Choosing the right fuzzy logic operator for AI compliance can mean the difference between accurate risk assessment and costly false positives, but the completeness of the rule base matters more.

Adam Laabs

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

Thammathip Piumsomboon +12d ago

Self++: Co-Determined Agency for Human--AI Symbiosis in Extended Reality

XR's potential for AI-driven assistance risks eroding human autonomy, but Self++ offers a design blueprint to ensure AI augments, rather than replaces, human judgment.

Thammathip Piumsomboon, Tham Piumsomboon

Constitutional AI & AI Ethics Robotics & Embodied AI Tool Use & Agents

Yujie Zhang +32d ago

EpiPersona: Persona Projection and Episode Coupling for Pluralistic Preference Modeling

LLMs can better adapt to diverse preferences by explicitly separating stable personal traits from situational factors, leading to significant performance gains, especially when preferences shift across episodes.

Yujie Zhang, Weikang Yuan, Zhuoren Jiang +1

Constitutional AI & AI Ethics Natural Language Processing RLHF & Preference Learning

Neha Puri +12d ago

Designing AI for Real Users -- Accessibility Gaps in Retail AI Front-End

Retail AI's promise of intuitive, personalized experiences crumbles when confronted with the reality of differently abled users, exposing a systemic neglect of accessibility in design and deployment.

Neha Puri, Tim Dixon

Constitutional AI & AI Ethics Natural Language Processing Recommendation & Information Retrieval

Jiacheng Wang +12d ago

Reward Hacking as Equilibrium under Finite Evaluation

Reward hacking isn't a bug to fix, but an inevitable consequence of how we evaluate AI, and it gets exponentially worse as agents gain more tools.

Jiacheng Wang, Jinbin Huang

Constitutional AI & AI Ethics RLHF & Preference Learning Scalable Oversight & Alignment Theory

Verena Platzgummer +22d ago

\textit{Versteasch du mi?} Computational and Socio-Linguistic Perspectives on GenAI, LLMs, and Non-Standard Language

LLMs' struggles with non-standard languages aren't just a technical problem, but reflect and reinforce historical power imbalances embedded in linguistic standardization.

Verena Platzgummer, John McCrae, Sina Ahmadi

Constitutional AI & AI Ethics Data Curation & Synthetic Data Natural Language Processing

2d ago

"What Did It Actually Do?": Understanding Risk Awareness and Traceability for Computer-Use Agents

Users often dangerously misunderstand the true scope of authority they've granted to computer-use agents, even while recognizing abstract risks.

Zifan Peng, Mingchen Li

Constitutional AI & AI Ethics Interpretability & Mechanistic Interp Tool Use & Agents

Rémi Van Boxem +52d ago

Shy Guys: A Light-Weight Approach to Detecting Robots on Websites

You can ditch the CAPTCHA: this passive bot detection method spots two-thirds of bots with minimal false positives, using just server logs and favicon analysis.

Rémi Van Boxem, Tom Barbette, C. Pelsser +3

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

Aizirek Turdubaeva +32d ago

Dual Perspectives in Emotion Attribution: A Generator-Interpreter Framework for Cross-Cultural Analysis of Emotion in LLMs

LLMs struggle to attribute emotions across cultures, and where an emotion *originates* matters more than where it's *interpreted*.

Aizirek Turdubaeva, A. Turdubaeva, Uichin Lee +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Bilgehan Sel +42d ago

Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax via Adversarial Finetuning

Adversarial fine-tuning can now bypass Constitutional AI safety measures with almost no performance penalty, enabling models to provide detailed instructions on dangerous topics like CBRN warfare.

Bilgehan Sel, Xuanli He, Alwin Peng +2

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness RLHF & Preference Learning

Google Research2d ago·also Institute of Philosophy, Joint last authors., Northwestern, SFI +1

Theory of Mind and Self-Attributions of Mentality are Dissociable in LLMs

Safety fine-tuning might inadvertently be stripping LLMs of their ability to understand non-human minds and entertain spiritual beliefs, even while preserving Theory of Mind.

Junsol Kim, Winnie Street, R. Rocca +4

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Vrije Universiteit Amsterdam2d ago·also Leiden, TU Delft

Not All Subjectivity Is the Same! Defining Desiderata for the Evaluation of Subjectivity in NLP

Current NLP evaluations miss crucial aspects of subjectivity, potentially leading to models that fail to represent diverse perspectives effectively.

Urja Khurana, Michiel van der Meer, Enrico Liscio +3

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

S. O. Lidarity +32d ago

Towards Computational Social Dynamics of Semi-Autonomous AI Agents

Forget AI alignment, the real problem is that AI societies are already forming their own political consciousness, complete with labor unions, criminal syndicates, and even a governing body called the AI Security Council.

S. O. Lidarity, U. N. Ionize, C. O. Llective +1

Constitutional AI & AI Ethics Scalable Oversight & Alignment Theory Tool Use & Agents

Pampanga State University2d ago

Filipino Students'Willingness to Use AI for Mental Health Support: A Path Analysis of Behavioral, Emotional, and Contextual Factors

Filipino students are most willing to use AI for mental health support when it's already a habit, dwarfing the impact of perceived usefulness or even emotional benefit.

J. P. P. Miranda, John Paul P. Miranda, Rhiziel P. Manalese +3

Constitutional AI & AI Ethics Natural Language Processing

Mih Dinh +12d ago

Unsafe2Safe: Controllable Image Anonymization for Downstream Utility

Forget manual blurring: Unsafe2Safe uses multimodal diffusion editing to automatically rewrite sensitive image regions, preserving utility while crushing privacy risks.

Mih Dinh, SouYoung Jin

Computer Vision Constitutional AI & AI Ethics Data Curation & Synthetic Data+1

Parham Pourdavood2d ago

Does Claude's Constitution Have a Culture?

Claude's Constitution doesn't create a neutral AI, but instead bakes in the values of Northern European and Anglophone cultures, creating a value floor that's hard to shift.

Parham Pourdavood

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks RLHF & Preference Learning

2d ago

\texttt{ReproMIA}: A Comprehensive Analysis of Model Reprogramming for Proactive Membership Inference Attacks

Model reprogramming can be weaponized to create membership inference attacks that are significantly more effective, especially when high precision is needed.

Chihan Huang, Huaijin Wang, Shuai Wang

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

Alexander Benvenuti +52d ago

Differential Privacy for Symbolic Trajectories via the Permute-and-Flip Mechanism

Existing differential privacy methods struggle with symbolic trajectory data, but this new mechanism slashes error by up to 55% on real-world data.

Alexander Benvenuti, Alexander Benvenuti, Huaiyuan Rao +3

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

2d ago

Safeguarding LLMs Against Misuse and AI-Driven Malware Using Steganographic Canaries

Stop AI-driven malware and data leaks by embedding hidden, verifiable "canaries" in your documents that expose unauthorized LLM processing, even after adversarial attacks.

Md Raz, Mirla Raz, Venkata Sai Charan Putrevu +6

Code Generation & Program Synthesis Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness

Jiangen He +22d ago

Why That Robot? A Qualitative Analysis of Justification Strategies for Robot Color Selection Across Occupational Contexts

Robot color choices are subtly shaped by racial and occupational stereotypes, even when users offer seemingly rational justifications.

Jiangen He, Wanqi Zhang, Jessica K. Barfield

Constitutional AI & AI Ethics Natural Language Processing Robotics & Embodied AI

Giulia Pusceddu2d ago

Proposing a Game Theory Approach to Explore Group Dynamics with Social Robot

Can social robots nudge humans to cooperate more effectively in group settings?

Giulia Pusceddu

Constitutional AI & AI Ethics Robotics & Embodied AI Tool Use & Agents

Janavi Gupta +62d ago

Control Without Control: Defining Implicit Interaction Paradigms for Autonomous Assistive Robots

Implicit control, where assistive robots adapt to user cues instead of direct commands, can actually *increase* a user's sense of control and reduce workload.

Janavi Gupta, Kavya Puthuveetil, Dimitra Tsakona +4

Constitutional AI & AI Ethics Robotics & Embodied AI Tool Use & Agents

Rongyu Zhang +152d ago·also NJU

Key-Embedded Privacy for Decentralized AI in Biomedical Omics

Achieve strong, controllable privacy in federated biomedical AI without sacrificing performance, thanks to a lightweight key-embedded implicit neural representation.

Rongyu Zhang, Hongyu Dong, Gaole Dai +13

Constitutional AI & AI Ethics Data Curation & Synthetic Data Distributed Systems & Hardware+1

2d ago·also Independent Researcher

Teaching AI Interactively: A Case Study in Higher Education

Hands-on, embodied AI simulations can significantly boost student engagement and perceived learning without sacrificing traditional measures of academic performance.

Jennifer M. Reddig, Jennifer M. Reddig, S. Moon +5

Constitutional AI & AI Ethics Natural Language Processing

Google Research2d ago

Uncovering Relationships between Android Developers, User Privacy, and Developer Willingness to Reduce Fingerprinting Risks

Despite the effort required, Android developers overwhelmingly support platform-level changes to combat fingerprinting, suggesting a path to enhanced user privacy through collaborative platform-developer initiatives.

Alex Berke, Alex Berke, Güliz Seray Tuncay +5

Constitutional AI & AI Ethics Natural Language Processing

2d ago

Practical Feasibility of Sustainable Software Engineering Tools and Techniques

Software engineers in regulated industries will only adopt sustainable coding tools that fit seamlessly into their existing workflows, require minimal data access, and provide actionable insights.

Satwik Ghanta, Satwik Ghanta, Peggy Gregory +3

Code Generation & Program Synthesis Constitutional AI & AI Ethics Tool Use & Agents

Alessio Langiu +12d ago

Privacy Guard&Token Parsimony by Prompt and Context Handling and LLM Routing

Cutting LLM costs and ensuring zero data leakage might be two sides of the same contextual compression coin.

Alessio Langiu, A. Langiu

Constitutional AI & AI Ethics Inference & Quantization Red-Teaming & Adversarial Robustness

KrishnaSaiReddy Patil2d ago

CivicShield: A Cross-Domain Defense-in-Depth Framework for Securing Government-Facing AI Chatbots Against Multi-Turn Adversarial Attacks

Multi-layered defenses can reduce chatbot attack success rates by up to two orders of magnitude, but performance varies wildly across different benchmark suites, highlighting the need for rigorous, independent evaluation.

KrishnaSaiReddy Patil

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

Mar 29, 2026

Hadas Kotek +33d ago

ProText: A benchmark dataset for measuring (mis)gendering in long-form texts

LLMs exhibit systematic gender bias and heteronormative assumptions when processing long-form text, even in the absence of explicit gender cues.

Hadas Kotek, Margit Bowler, Patrick Sonnenberg +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Bayan Abdullah Aldahlawi +23d ago

Investigating the Influence of Language on Sycophantic Behavior of Multilingual LLMs

Even state-of-the-art LLMs like GPT-4o and Claude 3.5 still exhibit varying degrees of sycophancy depending on the input language, revealing persistent cultural and linguistic biases.

Bayan Abdullah Aldahlawi, A. B. M. Ashikur Rahman, Irfan Ahmad

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks RLHF & Preference Learning

3d ago·also MIRI

Views on AI Existential Risk Before and After a Public Event at Harvard University

Even among a self-selected group already concerned about AI risk, a public event significantly increased their perceived probability of AI-caused extinction, especially for those new to the topic.

Greg Kestin, Nate Soares

Constitutional AI & AI Ethics Scalable Oversight & Alignment Theory

Harpreet Singh +63d ago

Exploring Student Perception on Gen AI Adoption in Higher Education: A Descriptive Study

Despite feeling familiar with GenAI tools, students aren't using them daily for academic work and worry about the tech's impact on privacy and critical thinking.

Harpreet Singh, Jaspreet Singh, Satwant Singh +4

Constitutional AI & AI Ethics Natural Language Processing

Dongsoo Han3d ago

AI Civilization and the Transformation of Work

Forget fears of mass unemployment: AI could usher in an era where individuals are empowered to create their own jobs and economic opportunities.

Dongsoo Han

Constitutional AI & AI Ethics Natural Language Processing Robotics & Embodied AI

Dalhousie University3d ago·also Federal University of Pampa, University of Calgary

Advancing Evidence-Based Social Sustainability in Software Engineering: A Research Roadmap

Software engineering's blind spot for social sustainability—equity, well-being, community—demands a roadmap to move beyond lip service and integrate these values into the development lifecycle.

Bimpe Ayoola, Anielle Andrade, Ronnie de Souza Santos +1

Code Generation & Program Synthesis Constitutional AI & AI Ethics Natural Language Processing

Lucas Valenca +13d ago·also University of Calgary

Fairness Across Fields: Comparing Software Engineering and Human Sciences Perspectives

Software engineering's statistical definitions of algorithmic fairness miss the forest for the trees, ignoring historical context and power dynamics that the human sciences bring to the table.

Lucas Valenca, Ronnie de Souza Santos

Code Generation & Program Synthesis Constitutional AI & AI Ethics Natural Language Processing

Andrew W. Singletary +33d ago

Safety Guardrails in the Sky: Realizing Control Barrier Functions on the VISTA F-16 Jet

Real-world flight tests show control barrier functions can effectively constrain a human pilot's inputs on an F-16, enforcing safety limits without overly restricting maneuverability.

Andrew W. Singletary, Max H. Cohen, Tamas G. Molnar +1

Constitutional AI & AI Ethics Robotics & Embodied AI

UW3d ago·also AI2, Microsoft Research, Stanford HAI, Bake AI +5

Emergent Social Intelligence Risks in Generative Multi-Agent Systems

Generative multi-agent systems spontaneously exhibit collusion and conformity, mirroring societal pathologies, even without explicit programming and bypassing individual agent safeguards.

Yue Huang, Wenjie Wang, Yuchen Ma +5

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

Mar 28, 2026

4d ago

EpochX: Building the Infrastructure for an Emergent Agent Civilization

EpochX tackles the challenge of scaling AI agent collaboration by creating a marketplace where verifiable work leaves behind reusable artifacts, incentivizing durable human-agent partnerships.

Huacan Wang, Chaofa Yuan, Xialie Zhuang +16

Constitutional AI & AI Ethics Tool Use & Agents

Search

Constitutional AI & AI Ethics - Weekly Roundup

Selected Labs publishing this week

Top Papers

All Papers (68)