Search papers, labs, and topics across Lattice.
This paper introduces an evaluation framework to assess Large Language Models' (LLMs) ability to simulate culturally appropriate emotional responses to bureaucratic red tape. Through a pilot study across diverse cultural contexts, the authors found that LLMs exhibit limited alignment with human emotional responses, particularly in Eastern cultures, and that cultural prompting strategies do not significantly improve this alignment. The authors also introduce RAMO, an interactive interface for simulating citizen emotional responses and collecting human data to improve models.
LLMs struggle to simulate culturally nuanced emotional responses to bureaucratic processes, especially in Eastern cultures, suggesting current models lack the socio-cultural understanding needed for accurate policy simulation.
Improving policymaking is a central concern in public administration. Prior human subject studies reveal substantial cross-cultural differences in citizens'emotional responses to red tape during policy implementation. While LLM agents offer opportunities to simulate human-like responses and reduce experimental costs, their ability to generate culturally appropriate emotional responses to red tape remains unverified. To address this gap, we propose an evaluation framework for assessing LLMs'emotional responses to red tape across diverse cultural contexts. As a pilot study, we apply this framework to a single red-tape scenario. Our results show that all models exhibit limited alignment with human emotional responses, with notably weaker performance in Eastern cultures. Cultural prompting strategies prove largely ineffective in improving alignment. We further introduce \textbf{RAMO}, an interactive interface for simulating citizens'emotional responses to red tape and for collecting human data to improve models. The interface is publicly available at https://ramo-chi.ivia.ch.