Khatam UniversityTehran Institute for Advanced StudiesUniversity of TehranJun 10, 2026arXiv:2606.12599

Constrained Semantic Decompression in LLMs through Persian Proverb-Conditioned Story Generation

Zahra Habibzadeh, Paria Khoshtab, Amir Mesbah, Yadollah Yaghoobzadeh

AI Summary

This study addresses the challenge of transforming Persian proverbs into engaging narratives, framing it as a constrained semantic decompression task. By introducing the Proverb Aligned Narrative Dataset (PAND), the authors evaluate large language models (LLMs) on their ability to generate stories that faithfully reflect the moral and causal structures of the proverbs. The findings highlight a significant "decompression gap," where LLMs excel in fluency but struggle with accurately conveying deeper meanings, indicating that errors stem more from difficulties in translating abstract concepts than from a lack of knowledge.

Key Contribution

LLMs may sound fluent, but they often miss the moral essence of proverbs, revealing a critical gap in their narrative generation capabilities.

Abstract

Transforming a dense, abstract proverb into an engaging and morally faithful narrative requires deep cultural understanding and robust semantic grounding. We frame this problem as a \emph{constrained semantic decompression} task and study proverb-conditioned story generation as a testbed for abstraction-to-realization in large language models (LLMs). Focusing on Persian, we introduce the Proverb Aligned Narrative Dataset (PAND), pairing proverbs with human-written stories and explicit meanings. By a hybrid evaluation framework that combines human-calibrated LLM-as-a-Judge with structural metrics, we analyze model behavior across multiple prompting regimes. Our findings reveal a persistent \emph{decompression gap}: current LLMs often achieve strong surface-level fluency while failing to faithfully instantiate the underlying moral and causal structure encoded in proverbs. We further show that explicit reasoning and iterative refinement can partially mitigate these failures, suggesting that many decompression errors arise from difficulties in translating abstract meaning into narrative form rather than a complete lack of relevant knowledge. Our proposed task naturally extends to other forms of compressed cultural knowledge.

Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Constrained Semantic Decompression in LLMs through Persian Proverb-Conditioned Story Generation

Related Papers