Search papers, labs, and topics across Lattice.
This paper introduces C4STYLI, a new benchmark dataset of translated movie titles and advertising slogans from Hong Kong and mainland China, to evaluate LLMs' ability to recognize and generate culturally-resonant aesthetic stylistics. Experiments reveal that LLMs exhibit significant differences from humans in stylistic recognition, with performance varying across domains and a disconnect between recognition and generation capabilities. Through structural ablation with logistic regression probes, the authors find that LLMs' stylistic recognition in the Hong Kong setting relies heavily on surface-level linguistic features, indicating a limited understanding of deeper stylistic structures.
LLMs struggle to grasp the nuances of cross-cultural aesthetic stylistics, often mistaking surface-level linguistic features for genuine cultural understanding.
Large Language Models (LLMs) are increasingly deployed in diverse cultural contexts, yet their ability to master aesthetic stylistics, i.e., the strategic use of language to evoke cultural resonance, remains underexplored. We curate C4STYLI, a benchmark of highly stylized translated movie titles and advertising slogans from Hong Kong and the Chinese Mainland, to evaluate LLMs via the lens of behavioral recognition and productive competence. Extensive evaluations show that LLMs differ from humans in stylistic recognition, and this recognition ability varies across text domains. In addition, stylistic recognition and generation performance in LLMs are not consistently aligned. To further examine whether LLMs genuinely capture stylistic information in stylistic recognition, we conduct structural ablation with logistic regression probes. We find that, in the Hong Kong setting, stylistic recognition in LLMs relies primarily on surface-level linguistic information rather than stylistic structure. This suggests limited sensitivity to Hong Kong-specific stylistic structure.