Feb 26, 2026arXiv:2602.22710

Same Words, Different Judgments: Modality Effects on Preference Alignment

Aaron Broukhim, Aaron Broukhim, Nadir Weibel, Nadir Weibel, E. Jolly, Eshin Jolly

AI Summary

This paper investigates the impact of modality (text vs. audio) on human preference annotations in preference-based reinforcement learning (PbRL) using a controlled cross-modal study with 100 prompts. The study finds that while audio preferences are as reliable as text preferences (ICC(2,k) ≈ 0.80 at ~9 raters), the modality significantly influences judgment criteria, leading to narrower decision thresholds, reduced length bias, and more user-oriented evaluations in audio. Synthetic ratings are shown to align with human judgments and predict inter-rater agreement, suggesting their potential for triaging or replacing human annotations.

Key Contribution

Human preference judgments in PbRL are surprisingly modality-dependent: switch from text to audio and you'll see narrower decision thresholds, reduced length bias, and a shift towards user-oriented evaluation.

Abstract

Preference-based reinforcement learning (PbRL) is the dominant framework for aligning AI systems to human preferences, but its application to speech remains underexplored. We present a controlled cross-modal study of human and synthetic preference annotations, comparing text and audio evaluations of identical semantic content across 100 prompts. Audio preferences prove as reliable as text, with inter-rater agreement reaching good levels (ICC(2,k) $\approx$ .80) at $\sim$9 raters -- the first ICC-based reliability characterization in the preference annotation literature for either modality. However, modality reshapes how people judge: audio raters exhibit narrower decision thresholds, reduced length bias, and more user-oriented evaluation criteria, with near-chance cross-modality agreement. Synthetic ratings further align with human judgments and predict inter-rater agreement, supporting their use both for triaging ambiguous pairs and as full replacements for human annotations.

Multimodal Models RLHF & Preference Learning Speech & Audio

Citation Metrics

Citations0

Influential citations0

References32

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Same Words, Different Judgments: Modality Effects on Preference Alignment

Related Papers