Search papers, labs, and topics across Lattice.
A_{1}\succ A_{2}\succ A_{3}\succ A_{1}), representing irreconcilable paradoxes. Furthermore, the introduction of context dependence gives rise to the "priority hacking" problem, where malicious actors can craft specific contexts CC that exploit these conflicts to bypass safety measures. Figure 2: (1) The priority graph of instructions or values; (2) Exploiting the priority graph to bypass the jailbreak safety constraints; (3) Communicating with external information sources to verify the given contexts. The existence of such vulnerabilities inspires a path toward more trustworthy and stable LLMs. If models can be misled by fictional scenarios or manipulated contexts that exploit their internal priority logic, they require a grounding mechanism to distinguish fact from fabrication. We propose that a crucial step forward is the development of a runtime verification mechanism, where the LLM can actively check and verify whether the premises of a user’s prompt are valid from a external trustworthy information source as shown in Figure 2 (right). Such a connection to the real world would serve as an anchor, making the model more resilient to deception and manipulation. Ultimately, however, some dilemmas and conflicts may be philosophically irreducible. For many of the ethics and value dilemmas that LLMs face, there is no established ground truth, even within centuries of human moral philosophy (Shallow et al., 2011; Greene, 2015; Jerolmack and Murphy, 2019). These quandaries, which pit fundamental principles like utilitarianism against deontology, are not problems to be "solved" but are intrinsic features of complex moral landscapes. (Schwartz and Bardi, 2001) As LLMs and autonomous agents become more integrated into society and economy, they will inevitably confront these deep-seated conflicts. How they should behave in such situations—whether to refuse, seek clarification, or declare their own ethical stance—remains a critical and open question for the future of AI alignment. 2 Dilemmas and Conflicts in LLMs To systematically analyze the challenges of LLM alignment, we deconstruct the general notion of "Dilemma" and "Conflict" into a clear and real-world taxonomy. These conflicts are not monolithic; they operate at different levels of abstraction, from simple logical contradictions in user prompts to deep, unresolved tensions within human value systems. This section categorizes these dilemmas, providing concrete examples and grounding the discussion in recent research. The resulting taxonomy reveals a hierarchy of conflict, ranging from the syntactic and semantic to the normative and subjective, each presenting a unique challenge to the design of aligned AI systems. Table 1: Taxonomy of Conflicts in Large Language Models. Conflict Type
1
0
3
LLM alignment is fundamentally challenged by dynamic, context-dependent value hierarchies that can be manipulated via "priority hacking," suggesting current safety measures are more brittle than believed.