Mar 4, 2026arXiv:2603.03652

Linguistically Informed Graph Model and Semantic Contrastive Learning for Korean Short Text Classification

JaeGeon Yoo, Byoungwook Kim, Yeongwook Yang, Hong-Jun Jang

AI Summary

This paper introduces LIGRAM, a hierarchical heterogeneous graph model tailored for Korean short text classification, which constructs sub-graphs at the morpheme, POS, and named-entity levels to capture grammatical and semantic dependencies. To further improve performance, they apply Semantics-aware Contrastive Learning (SemCon) to reflect semantic similarity across documents and establish clearer decision boundaries. Experiments on four Korean short-text datasets demonstrate that LIGRAM consistently outperforms existing baselines, validating the effectiveness of language-specific graph representations combined with SemCon for agglutinative languages.

Key Contribution

By explicitly modeling Korean's linguistic structure in a graph-based neural network, this work achieves state-of-the-art short text classification performance, highlighting the importance of language-specific architectures.

Abstract

Short text classification (STC) remains a challenging task due to the scarcity of contextual information and labeled data. However, existing approaches have pre-dominantly focused on English because most benchmark datasets for the STC are primarily available in English. Consequently, existing methods seldom incorporate the linguistic and structural characteristics of Korean, such as its agglutinative morphology and flexible word order. To address these limitations, we propose LIGRAM, a hierarchical heterogeneous graph model for Korean short-text classification. The proposed model constructs sub-graphs at the morpheme, part-of-speech, and named-entity levels and hierarchically integrates them to compensate for the limited contextual information in short texts while precisely capturing the grammatical and semantic dependencies inherent in Korean. In addition, we apply Semantics-aware Contrastive Learning (SemCon) to reflect semantic similarity across documents, enabling the model to establish clearer decision boundaries even in short texts where class distinctions are often ambiguous. We evaluate LIGRAM on four Korean short-text datasets, where it consistently outperforms existing baseline models. These outcomes validate that integrating language-specific graph representations with SemCon provides an effective solution for short text classification in agglutinative languages such as Korean.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Linguistically Informed Graph Model and Semantic Contrastive Learning for Korean Short Text Classification

Related Papers