Search papers, labs, and topics across Lattice.
The paper introduces CULTURE-MT, a new benchmark for evaluating the cultural effectiveness of social media UGC translation, comprising 1,002 UGC notes across 14 domains. They fine-tuned Qwen3 models on UGC-oriented data and proposed cultural effectiveness as a new evaluation criterion, focusing on expression accuracy and cultural adaptability. Experiments with 15 models revealed that traditional metrics inadequately capture cultural effectiveness, while base LLM cultural effectiveness correlates with model size.
Current translation metrics miss the mark on capturing the cultural nuances and emotional resonance crucial for effective social media UGC translation.
Social media platforms enable large-scale cross-lingual communication, but translating user-generated content (UGC) remains challenging due to its informal style, cultural references, and interaction-based expressions. While recent LLMs have improved translation quality, existing benchmarks and metrics often fail to capture whether translations convey intended meaning and cultural resonance in real-world settings. In this work, we introduce CULTURE-MT, a benchmark for social media translation that focuses on both CULtural Transmission and UGC-specific emotion REsonance. CULTURE-MT consists of 1,002 UGC notes across 14 domains, categorized into four types based on culture-loaded symbols and linguistic style features. We also construct UGC-oriented training data to fine-tune Qwen3-8B and Qwen3-32B as baselines. We propose cultural effectiveness as a new evaluation criterion, focusing on expression accuracy and cultural adaptability. Testing 15 models, including the baselines, we find that traditional metrics fail to capture cultural effectiveness. We also observe that cultural effectiveness on base LLMs correlates with model size. Our work provides a comprehensive evaluation system for UGC translation models and will offer an open evaluation platform to advance research in this area. We release the CULTURE-MT benchmark and provide an online leaderboard where submitted translation results can be evaluated by our trained JUDGER.