Bonn-Aachen International Center for InformationLamarr Institute for Machine LearningFeb 18, 2026arXiv:2602.16379

Label-Consistent Data Generation for Aspect-Based Sentiment Analysis Using LLM Agents

Mohammad H.A. Monfared, Mohammad H. A. Monfared, Lucie Flek, Lucie Flek, Akbar Karimi, Akbar Karimi

AI Summary

This paper introduces an agentic data augmentation method for Aspect-Based Sentiment Analysis (ABSA) that iteratively generates and verifies synthetic training examples using LLMs. The agentic approach improves label preservation in augmented data compared to a prompting-based baseline, especially for tasks involving aspect term generation. Experiments across three ABSA subtasks, four SemEval datasets, and two encoder-decoder models (T5-Base and Tk-Instruct) demonstrate that agentic augmentation, when combined with real data, consistently outperforms prompting-based generation, particularly for T5-Base.

Key Contribution

Forget prompt engineering: agent-based LLM data augmentation preserves labels better and boosts ABSA performance, especially for smaller models.

Abstract

We propose an agentic data augmentation method for Aspect-Based Sentiment Analysis (ABSA) that uses iterative generation and verification to produce high quality synthetic training examples. To isolate the effect of agentic structure, we also develop a closely matched prompting-based baseline using the same model and instructions. Both methods are evaluated across three ABSA subtasks (Aspect Term Extraction (ATE), Aspect Sentiment Classification (ATSC), and Aspect Sentiment Pair Extraction (ASPE)), four SemEval datasets, and two encoder-decoder models: T5-Base and Tk-Instruct. Our results show that the agentic augmentation outperforms raw prompting in label preservation of the augmented data, especially when the tasks require aspect term generation. In addition, when combined with real data, agentic augmentation provides higher gains, consistently outperforming prompting-based generation. These benefits are most pronounced for T5-Base, while the more heavily pretrained Tk-Instruct exhibits smaller improvements. As a result, augmented data helps T5-Base achieve comparable performance with its counterpart.

Data Curation & Synthetic Data Natural Language Processing Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References19

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Label-Consistent Data Generation for Aspect-Based Sentiment Analysis Using LLM Agents

Related Papers