CNRSFeb 17, 2026arXiv:2602.15778

*-PLUIE: Personalisable metric with Llm Used for Improved Evaluation

Quentin Lemesle, Léane Jourdan, Daisy Munson, Pierre Alain, Jonathan Chevelu, Arnaud Delhay, Damien Lolive

AI Summary

The paper introduces *-PLUIE, a task-specific prompting variant of the ParaPLUIE metric, which is a perplexity-based LLM-judge that estimates confidence over "Yes/No" answers without generating text. This approach aims to address the computational expense and post-processing requirements of traditional LLM-as-a-judge methods. Experiments demonstrate that *-PLUIE achieves stronger correlations with human ratings compared to ParaPLUIE, while preserving low computational cost.

Key Contribution

Key contribution not extracted.

Abstract

Evaluating the quality of automatically generated text often relies on LLM-as-a-judge (LLM-judge) methods. While effective, these approaches are computationally expensive and require post-processing. To address these limitations, we build upon ParaPLUIE, a perplexity-based LLM-judge metric that estimates confidence over ``Yes/No'' answers without generating text. We introduce *-PLUIE, task specific prompting variants of ParaPLUIE and evaluate their alignment with human judgement. Our experiments show that personalised *-PLUIE achieves stronger correlations with human ratings while maintaining low computational cost.

Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

*-PLUIE: Personalisable metric with Llm Used for Improved Evaluation

Related Papers