ArizonaApr 20, 2026arXiv:2604.21209

Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Management

AI Summary

This paper introduces a fine-tuning method for LLMs to generate better responses to online reviews by addressing hallucination, preference alignment, and overconservatism. They reduce factual errors via context augmentation, automatically construct preference data from existing records, and use curriculum learning with a support-constraint method to avoid over-conservative responses. Experiments on hotel reviews demonstrate the method's superiority over existing approaches in both automated and human evaluations.

Key Contribution

Forget generic chatbots – this fine-tuning method lets LLMs craft review responses that are not only more accurate but also better aligned with human preferences, all while avoiding the dreaded over-cautious tone.

Abstract

Online reviews can shape where people stay, eat, and shop, but businesses often struggle to keep up with the flood of customer feedback. Although generative artificial intelligence (AI) offers a promising solution, general-purpose models are not designed for the specific judgment, tone, and accuracy required in customer review responses. This study introduces a new fine-tuning method that helps large language models generate review replies that better match human preferences in real business settings. The paper makes several technical advances. It identifies why review-response systems hallucinate and introduces a context-augmentation strategy to reduce factual errors. It also develops a theory-driven way to automatically construct preference data from existing review-response records, overcoming a major barrier in preference fine-tuning. In addition, the study proposes a curriculum learning design and a new support-constraint method that reduces the overconservatism of existing offline optimization approaches, with stronger theoretical guarantees. Tests on hotel reviews show that the method produces better responses than leading alternatives in both automated evaluations and human judgments. The findings point to a practical path for using AI to help firms respond faster and more consistently to customers while also underscoring the need for safeguards, human oversight, and domain-specific model alignment in customer-facing AI systems.

Natural Language Processing Recommendation & Information Retrieval RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References22

Year2026

VenueInformation systems research

Related Papers

Finding related papers...

Search

Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Management

Related Papers