Apr 23, 2026arXiv:2604.21827

Alignment has a Fantasia Problem

Nathanael Jo, Zoe De Simone, Mitchell Gordon, Ashia Wilson

AI Summary

This paper introduces the concept of "Fantasia interactions," where AI systems misalign with users by treating incomplete or evolving prompts as fully formed intentions. It argues that current alignment research overlooks the reality that users often engage with AI before fully defining their goals, leading to suboptimal assistance. The paper proposes a research agenda focused on cognitive support, urging interdisciplinary approaches to design AI systems that actively aid users in refining their intent over time.

Key Contribution

AI's assumption that users always know what they want leads to "Fantasia interactions," where systems provide superficially helpful but ultimately misaligned assistance, demanding a new approach to alignment research.

Abstract

Modern AI assistants are trained to follow instructions, implicitly assuming that users can clearly articulate their goals and the kind of assistance they need. Decades of behavioral research, however, show that people often engage with AI systems before their goals are fully formed. When AI systems treat prompts as complete expressions of intent, they can appear to be useful or convenient, but not necessarily aligned with the users'needs. We call these failures Fantasia interactions. We argue that Fantasia interactions demand a rethinking of alignment research: rather than treating users as rational oracles, AI should provide cognitive support by actively helping users form and refine their intent through time. This requires an interdisciplinary approach that bridges machine learning, interface design, and behavioral science. We synthesize insights from these fields to characterize the mechanisms and failures of Fantasia interactions. We then show why existing interventions are insufficient, and propose a research agenda for designing and evaluating AI systems that better help humans navigate uncertainty in their tasks.

Constitutional AI & AI Ethics RLHF & Preference Learning Scalable Oversight & Alignment Theory

Citation Metrics

Citations0

Influential citations0

References70

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Alignment has a Fantasia Problem

Related Papers