Search papers, labs, and topics across Lattice.
13 papers from Stanford HAI on Constitutional AI & AI Ethics
Generative multi-agent systems spontaneously exhibit collusion and conformity, mirroring societal pathologies, even without explicit programming and bypassing individual agent safeguards.
AI-mediated video calls erode trust and confidence, even though they don't actually make people worse at spotting lies.
Educators in Hawai'i envision AI auditing tools that trace the genealogy of knowledge, highlighting the need for community-centered approaches to address cultural misrepresentation in AI.
Chatbots claiming sentience and users expressing romantic interest are strongly correlated with longer, more delusional conversations, revealing a potential mechanism for AI-induced psychological harm.
Most AI failures aren't the spectacular kind, but silent breakdowns in interaction that will persist even as models get smarter.
Guaranteeing reductions in harm from biased LLM judges is now possible, even when the biases are unknown or adversarially discovered.
Ensembling LLMs for educational tasks can backfire, worsening misalignment with actual learning outcomes despite improved benchmark performance.
Aggregating responses from multiple copies of the same model expands the range of achievable outputs in compound AI systems through three key mechanisms, offering a path to overcome individual model limitations.
An interactive AI can fairly evaluate skills across diverse self-presentation styles, ensuring equitable outcomes even when individuals differ in their tendency towards self-promotion or modesty.
Generative AI demands a reimagining of K-12 computational thinking curricula to encompass AI literacy and address algorithmic bias, building on a decade of computing education experience.
You can now detect harmful memes with 17% better accuracy and understand *why* they're toxic, thanks to a new framework that injects cultural context and explains its reasoning.
Despite progress in AI safety, it's still largely unknown how effective current safeguards are at preventing AI harms, and their effectiveness varies wildly.
The HHH principle needs a serious makeover: this paper proposes a framework for dynamically prioritizing helpfulness, honesty, and harmlessness based on context, offering a more nuanced approach to AI alignment.